Re: [R] computing distance in miles or km between 2 street addre

2007-09-06 Thread Ted Harding
On 06-Sep-07 18:42:32, Philip James Smith wrote:
> Hi R-ers:
> 
> I need to compute the distance between 2 street addresses in
> either km or miles. I do not care if the distance is a "shortest
> driving route" or if it is "as the crow flies."
> 
> Does anybody know how to do this? Can it be done in R? I have
> thousands of addresses, so I think that Mapquest is out of the
> question!
> 
> Please rely to: [EMAIL PROTECTED]
> 
> Thank you!
> Phil Smith

That's a somewhat ill-posed question! You will for a start
need a database of some kind, either of geographical locations
(coordinates) of street addresses, or of the metric of the
road network with capability to identify the street addresses
in the database.

If it's just "as the crow flies", then it can be straightforwardly
computed in R, either by Pythogoras (when they are not too far
apart) or using a function which takes account of the shape of
the Earth,

There are many R packages which have to do with mapping data.
Search for "map" through the list of R packages at

http://finzi.psych.upenn.edu/R/library/maptools/html/00Index.html

-- maptools in particular. Also look at (for instance) aspace.

For "shortest driving route" then you need to find the shortest
distance through a network. You may find some hints in the
package optim -- but there must be some R experts out there
on this sort of thing!

However, the primary need is for the database which gives
the distance information in one form or another. What were
you proposing to use for this? As far as I know, R has no
database relevant to street addresses!

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Sep-07   Time: 23:17:57
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to do ANOVA with fractional values and overcome the

2007-09-06 Thread Ted Harding

On 06-Sep-07 06:55:36, Emre Sevinc wrote:
> I have exported a CSV file from my EXCEL worksheet and its last column
> contained decimal values:
> 
> Subject;Group;Side;Difference;PercentError
> M3;1;1;;
> M5;1;1;375;18,75
> M8;1;1;250;14,58
> M10;1;1;500;12,50
> M12;1;1;375;25,00
> .
> .
> .
> 
> 
> When I tried to do ANOVA test on it, R complained by givin error:
> 
>> Anova3LongAuditoryFemaleError.data <- read.csv("C:\\Documents\ and\
>> Settings\\Administrator\\My
>> Documents\\CogSci\\tez\\Anova3LongAuditoryFemaleError.csv", header =
>> TRUE, sep = ";")

Perhaps you need to specify the "dec" option to read.csv():

 read.csv(file, header = TRUE, sep = ",", quote="\"", dec=".",
  fill = TRUE, ...)

i.e.

Anova3LongAuditoryFemaleError.data <- read.csv("C:\\Documents\ and\
Settings\\Administrator\\My
Documents\\CogSci\\tez\\Anova3LongAuditoryFemaleError.csv",
header = TRUE, sep = ";", dec=",")

Or else use read.csv2(), where dec = "," is the default. See

  ?read.csv

and read about both read.csv and read.csv2

Otherwise, "18,75" is likely to be treated as text, and not as
a numerical value, and so PercetError will be converted into
a factor with character-valued levels.

Ted.

> 
>> Anova3LongAuditoryFemaleError.aov = aov(PercentError ~ (Group * Side),
>> data = Anova3LongAuditoryFemaleError.data)
> 
> Error in `storage.mode<-`(`*tmp*`, value = "double") : 
> invalid to change the storage mode of a factor
> In addition: Warning message:
> using type="numeric" with a factor response will be ignored in:
> model.response(mf, "numeric")
> 
> What must I do in order to make the ANOVA test on these fractional
> data?
> 
> Regards.
> 
> --
> Emre Sevinc
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Sep-07   Time: 09:39:04
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] length of a string

2007-09-05 Thread Ted Harding
On 05-Sep-07 13:50:57, João Fadista wrote:
> Dear all,
>  
> I would like to know how can I compute the length of a string in a
> dataframe. Example:
>  
> SEQUENCE   ID
> TGCTCCCATCTCCACGGHR04FS00645
> ACTGAACTCCCATCTCCAAT  HR0595847847
>  
> I would like to know how to compute the length of each SEQUENCE.
>  
> Best regards,
> João Fadista

  nchar("ACTGAACTCCCATCTCCAAT")
  [1] 20

seems to work. Find it, and related functions, with

  help.search("character")

As it happens, help.search("string") will not help!

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Sep-07   Time: 15:05:22
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integers

2007-09-04 Thread Ted Harding
On 04-Sep-07 09:53:43, Christoph Heibl wrote:
> Hi list,
> 
> The function is.integer tests if an object is of type integer:
> 
> see e.g.:
> 
> is.integer(12)# FALSE
> is.real(12)   # TRUE
> mode(12)  # "numeric"
> 
> But how can I test if a number is actually an integer? R seek is  
> difficult to search in this case because it mainly yields entries  
> about the integer()-function family.
> 
> Thanks for any hint!
> Christoph Heibl

In infer you want to test whether the real x has an integer value.

((x-floor(x))==0)

would do it (covers negative integers as well), though there may
well be a smarter way.

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Sep-07   Time: 11:11:46
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Derivative of a Function Expression

2007-09-03 Thread Ted Harding
On 03-Sep-07 21:45:40, Alberto Monteiro wrote:
> Rory Winston wrote:
>> 
>> I am currently (for pedagogical purposes) writing a simple numerical
>> analysis library in R. I have come unstuck when writing a simple
>> Newton-Raphson implementation, that looks like this:
>> 
>> f <- function(x) { 2*cos(x)^2 + 3*sin(x) +  0.5  }
>> 
>> root <- newton(f, tol=0.0001, N=20, a=1)
>> 
>> My issue is calculating the symbolic derivative of f() inside the 
>> newton() function. 
>>
> If it's pedagogical, maybe returning to basics could help.
> 
> What is f'(x)?
> 
> It's the limit of (f(x + h) - f(x)) / h when h tends to zero.
> 
> So, do it numerically: take a sufficiently small h and
> compute the limit. h must be small enough
> that h^2 f''(x) is much smaller than h f'(x), but big
> enough that f(x+h) is not f(x)
> 
> numerical.derivative <- function(f, x, h = 0.0001)
> {
>   # test something
>   (f(x + h) - f(x)) / h
> }
> 
> Ex:
> 
> numerical.derivative(cos, pi) = 5e-05 # should be -sin(pi) = 0
> numerical.derivative(sin, pi) = -1# ok
> numerical.derivative(exp, 0) = 1.5 # close enough
> numerical.derivative(sqrt, 0) = 100 # should be Inf
> 
> Alberto Monteiro

If you want to "go back to basics", it's worth noting that
for a function which has a derivative at x (which excludes
your sqrt(x) at x=0, despite the result you got above,
since only the 1-sided limit as h-->0+ exists):

   (f(x+h/2) - f(h-h/2))/h

is generally a far better approximation to f'(x) than is

   (f(x+h) - f(x))/h

since the term in   h^2 * f''(x)   in the expansion is
automatically eliminated! So the accuracy is O(h^2), not
O(h) as with yur definition.

num.deriv<-function(f,x,h=0.001){(f(x + h/2) - f(x-h/2))/h}

num.deriv(cos, pi)
## [1] 0

num.deriv(sin, pi)
[1] -1

but of course num.deriv(sqrt,0) won't work, since it requires
evaluation of sqrt(-h/2).

Hovever, other comparisons with your definition are possible:

numerical.derivative(sin,pi/3)-0.5
##[1] -4.33021e-05

num.deriv(sin,pi/3)-0.5
##[1] -2.083339e-08

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Sep-07   Time: 00:10:42
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Excel

2007-09-01 Thread Ted Harding
[Apologies if you sometime get this twice. The first mailing
 has not been delivered to the list after more than 10 hours]

On 31-Aug-07 10:38:07, Jim Lemon wrote:
> Rolf Turner wrote:
>> On 31/08/2007, at 9:10 AM, Antony Unwin wrote:
>> 
>> 
>>>Erich's more important point
>>>is that you need to speak the language of the people you cooperate
>>>with and often that language includes Excel.
>> 
>> 
>> So if the people you have to deal with are into astrology you should  
>> learn astrology?
>> 
> Yes, Rolf, if you want them to understand you. I once wrote a very 
> successful parody of those astrology character profiles simply by 
> borrowing an acquaintance's book and making dreadfully irreverent use
> of it.
> 
> Jim

Now that you mention that, Jim, I feel a POMO coming on.

  if( ! you.know.what.i.mean() ){
  browseURL("http://www.elsewhere.org/pomo";)
} else {
  stop("You know what I mean!")
  }

The structural features of horoscope and astrology texts would
readily lend themselves to implementation in such an engine.

Mind you, from my occasional browsings of these texts, I suspect
it has already been done and is widely used.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Aug-07   Time: 22:54:32
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Incomplete Gamma function

2007-09-01 Thread Ted Harding
in cases
like the Incomplete Gamma and Beta functions, where there can be
dispute over what is meant (see above), it is surely wise to spell
it out!

Best wishes to all,
Ted.

>> On 31 Aug 2007, at 00:29, [EMAIL PROTECTED] wrote:
>>
>>> Hello
>>>
>>> I am trying to evaluate an Incomplete gamma function
>>> in R. Library Zipfr gives the Igamma function. From
>>> Mathematica, I have:
>>>
>>> "Gamma[a, z] is the incomplete gamma function."
>>>
>>> In[16]: Gamma[9,11.1]
>>> Out[16]: 9000.5
>>>
>>> Trying the same in R, I get
>>>
>>>> Igamma(9,11.1)
>>> [1] 31319.5
>>> OR
>>>> Igamma(11.1,9)
>>> [1] 1300998
>>>
>>> I know I have to understand the theory and the math
>>> behind it rather than just ask for help, but while I
>>> am trying to do that (and only taking baby steps, I
>>> must admit), I was hoping someone could help me out.
>>>
>>> Regard
>>>
>>> Kris.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Sep-07   Time: 15:49:57
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Excel (off-topic, sort of)

2007-08-30 Thread Ted Harding
On 30-Aug-07 14:43:12, Thomas Lumley wrote:
> On Wed, 29 Aug 2007, Alberto Monteiro wrote:
>>>
>> Do you know what's in my wish list?
>>
>> I wish spreadsheets and computer languages had gone one
>> step further.
>>
>> I mean, it's nice to define Cell X to be "equal" to
>> Cell Y + 10, and then when we change Cell Y, magically we
>> see Cell X change.
>>
>> But why can't it be the reverse? Why can't I change Cell X
>> _and see the change in Cell Y_?
>>
> 
> Well, most statistical calculations are data-reducing, ie,
> many-to-one. 
> I don't think you are likely to find a language where you
> can change the confidence limits for the sample mean and
> have the data change in response.

Well, I have seen *people* do this (mentioning no names), and
with the assistance of software (but only using the software
in the direction Data --> CLs).

However, there does exist software which is capable of learning
people's behavioural criteria, and reacting according to their
estimated needs, so probably Thomas's speculation is not beyond
reach.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Aug-07   Time: 20:07:03
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OT: distribution of a pathological random variate

2007-08-29 Thread Ted Harding
On 29-Aug-07 17:39:17, Horace Tso wrote:
> Folks,
> 
> I wonder if anything could be said about the distribution of a random
> variate x, where
> 
> x = N(0,1)/N(0,1)
> 
> Obviously x is pathological because it could be 0/0. If we exclude this
> point, so the set is {x/(0/0)}, does x have a well defined
> distribution? or does it exist a distribution that approximates x. 
> 
> (The case could be generalized of course to N(mu1, sigma1)/N(mu2,
> sigma2) and one still couldn't get away from the singularity.)
> 
> Any insight or reference to related discussion is appreciated.
> 
> Horace Tso

A good question -- but it has a long-established answer. X has the
Cauchy distribution, whose density function is

  f(x) = 1/(pi*(1 + x^2))

Have a look at ?dcauchy

It is also the distribution of t with 1 degree of freedom.

See also ?dt

You don;t need to exclude the point (0,0) explicitly, since
it has zero probabilityof occurring. But the chance that the
denominator could be small enough to give a very large value
of X is quite perceptible.

Try

  X<-rcauchy(1000)
  max(X)

and similar. Play around!

Best wishes,
ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Aug-07   Time: 19:02:32
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the eigen value of a population matrix

2007-08-28 Thread Ted Harding
On 28-Aug-07 14:12:22, Marie Anouk Simard wrote:




It would seem (from the headers) that Marie sent her message
within a "TNEF" attachment, which the R-help server has duly
stripped off!

I would suggest thatshe re-sends her message in plain text,
so that we can all read it (and it will not get stripped).

Ted.

> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Aug-07   Time: 15:29:12
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R "old versions" archive anywhere?

2007-08-23 Thread Ted Harding
Hi Folks,

Does anyone know if (apparently not on CRAN) there is
any archive of older versions of R packages?

Or is it the case that not only are old versions on
CRAN humanesly destoyed, but all older versions out
in the wild are hunted down and shot?

(I can find older versions of the R "core", but not
of other packages).

With thanks,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-07   Time: 19:37:30
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to fit an linear model withou intercept

2007-08-23 Thread Ted Harding
On 23-Aug-07 09:56:33, Michal Kneifl wrote:
> Please could anyone help me?
> How can I fit a linear model where an intercept has no sense?
> Thanks in advance..
> 
> Michael

Presumably you mean "where a non-zero intercept has no sense"?

Suppose the model you want to fit is like

  Response ~ B+C+X+Y

Then

[A]
  lm(Response ~ B+C+X+Y)

will give you an intecept. But:

[B]
  lm(Response ~ B+C+X+Y - 1 )

will in effect force a zero intercept -- it removes the "Intercept"
term (which is implicit in the form [A]) from the model, thus in
effect giving it coefficient zero.

In "real life", the model at [A] corresponds to the equation

  Response = a*1 + b*B + c*C + x*X + y*Y + error

where a,b,c,x,y are the coefficients, so the "terms in the model"
are 1 , B , C , X , Y.

The model at [B] corresponds to

  Response = b*B + c*C + x*X + y*Y + error

so the "terms in the model" are B , C , X , Y -- i.e. the same
as before but with the term "1" removed, which is what is effected
by adding the term "-1" to the model formula.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 23-Aug-07   Time: 12:21:24
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does anyone.... worth a warning?!? No warning at all

2007-08-20 Thread Ted Harding
On 20-Aug-07 19:55:44, Rolf Turner wrote:
> On 20/08/2007, at 9:54 PM, Tom Willems wrote:
>> dear Mathew
>>
>> mean is a Generic function
>>
>> mean(x...)
>>
>> in wich x is a data object, like a  data frame a list
>> a numeric vector...
>>
>> so in your example it only reads the first character
>> and then reports it.
>>
>> try x = c(1,1,2)
>> mean(x)
> 
> I think you've completely missed the point. I'm sure Mathew
> now understands the syntax of the mean function. His point
> was that it would be very easy for someone to use this
> function incorrectly --- and he indicated very clearly *why*,
> by giving an example using max().
> 
> If mean() could be made safer to use by incorporating a warning,  
> without unduly adding to overheads, then it would seem sensible
> to incorporate such a warning.  Or to change the mean()
> function so that mean(1,2,3) returns ``2'' --- just as max 
> (1,2,3) returns ``3'' --- as Mathew *initially* (and quite
> reasonably) expected it to do.
> 
> cheers,
> Rolf Turner

I think Rolf makes a very important point. There are a lot of
idiosyncracies in R, which in time we get used to; but learning
about them is something of a "sociological" exercise, just as
one learns that when one's friend A says "X Y Z" is may not mean
the same as when one's friend B says it.

Another example is in the use of %*% for matrix multiplication
when one or both of the factors is a vector. If you came to R
from matlab/octave, where every vector is already either a row
vector or a column vector, you knew where you stood. But in R
the semantics of the syntax depend on the context in a more
complicated way. In R, x<-c(-1,1) is called a "vector", but it
does not have dimensions:

x<-c(-1,1)
dim(x)
NULL

So its relationship to matrix multiplication is ambiguous.

For example:

M<-matrix(c(1,2,3,4),nrow=2); M
 [,1] [,2]
[1,]13
[2,]24

x%*%M
 [,1] [,2]
[1,]11

and x is now coerced into a "column vector", which now (for that
immediate purpose) now does have dimensions (just as a row vector
would have in matlab/octave).

Similarly,

M%*%x
 [,1]
[1,]2
[2,]2

coerces it into a column vector. But now (asks the beginner who
has not yet got round to looking up ?"%*%") what happens with x%*%x?

Will we get column vector times row vector (a 2x2 matrix) or
row times column (a scalar)? In fact we get the latter:

x%*%x
 [,1]
[1,]2

All this is in accordance with ?"%*%":

Description:
 Multiplies two matrices, if they are conformable. If one argument
 is a vector, it will be coerced to a either a row or column matrix
 to make the two arguments conformable. If both are vectors it will
 return the inner product.


But now suppose y<-c(1,2,3), with x<-c(-1,1) as before.

x%*%y
Error in x %*% y : non-conformable arguments

because it is trying to make the inner product of vectors of unequal
length. Whereas someone who had got as far as the second sentence of
the Description, and did not take the hird sentence as strictly
literally as intended, might expect that x would be coerced into
column, and y into row, so that they were conformable for
multiplication, giving a 2x3 matrix result (perhaps on the grounds
that "it will return the inner product" means that it will do this
if they are conformable, otherwise doing the coercions described
in the first sentence).

That misunderstanding could be avoided if the last sentence read:

"If both are vectors it will return the inner product provided
both are the same length; otherwise it is an error and nothing
is returned."

Or perhaps x or y should not be called "vector" -- in linear
algebra people are used to "vector" being another name for
a 1-dimensional matrix, being either "row" or "column".
The R entity is not that sort of thing at all. The closest
that "R Language Definition" comes to defining it is:

"Vectors can be thought of as contiguous cells containing
homogeneous data. Cells are accessed through indexing
operations such as x[5]."

x<-matrix(x) will, of course, turn x into a paid-up column vector
(as you might guess from ?matrix, if you copy the "byrow=FALSE"
from the "as.matrix" explanation to the "matrix" explanation;
though in fact that is irrelevant, since e.g. "byrow=TRUE" has
no effect in matrix() -- so in fact there is no specification
in ?matrix as to whether to expect a row or column result).

Just a few thoughts. As I say we all get used to this stuff in
the end, but it can be bewildering (and a trap) for beginners.

Best wishes to all,
Ted.


E-Mail: (

Re: [R] Interweaving cell entries from 2 columns

2007-08-20 Thread Ted Harding
On 20-Aug-07 09:49:08, jenny wrote:
> Hi, how do I interweave 2 columns of data into a single column?  See
> illustration below. Thx-jenny I have 2 columns of data Col1   Col21
>2 3  45  6 7  89  10 I want to
> create
> a new column with interlocking cell centries12345678910   Get the new
> Windows Live Messenger! Try it!
> _
> Did you know you can import your contact list from Microsoft® Office® O
> 
> [[alternative HTML version deleted]]

Re-structuring your exmaple as received [please see to your mailer's
formating!] into what I think you may mean:

 
  Col1   Col2
 1  2
 3  4
 5  6
 7  8
 9 10
 
You haven't stated in what form these two columns are available to
you at the point where you want to "merge" them, so I'll give two
answers (there may be others, deending on the source of Col1 and
Col2).

1. Suppose they are two columns of a matrix M:

M
 c1 c2
[1,]  1  2
[2,]  3  4
[3,]  5  6
[4,]  7  8
[5,]  9 10

Then:

as.vector(t(M))
[1]  1  2  3  4  5  6  7  8  9 10


2. Suppose they are available as separate vectors Col1 and Col2:

as.vector(rbind(Col1,Col2))
[1]  1  2  3  4  5  6  7  8  9 10


Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Aug-07   Time: 12:27:54
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] an easy way to construct this special matirx

2007-08-16 Thread Ted Harding
On 16-Aug-07 03:10:17, [EMAIL PROTECTED] wrote:
> Hi,
> Sorry if this is a repost. I searched but found no results.
> I am wondering if it is an easy way to construct the following matrix:
> 
> r 1000
> r^2   r100
> r^3   r^2  r10
> r^4   r^3  r^2  r1
> 
> where r could be any number. Thanks.
> Wen

I dare say there's an even simpler way (and I feel certain
someone will post one ... ); but (example):

r<-0.1
r1<-r^((-3):0); r2<-r^(3:0)
R<-r1%*%t(r2)
R
##  [,1]  [,2]  [,3] [,4]
##[1,] 1.000 10.00 100.0 1000
##[2,] 0.100  1.00  10.0  100
##[3,] 0.010  0.10   1.0   10
##[4,] 0.001  0.01   0.11

R[upper.tri(R)]<-0
R
##  [,1] [,2] [,3] [,4]
##[1,] 1.000 0.00  0.00
##[2,] 0.100 1.00  0.00
##[3,] 0.010 0.10  1.00
##[4,] 0.001 0.01  0.11

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-07   Time: 09:37:28
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Covariance of data with missing values.

2007-08-15 Thread Ted Harding
On 15-Aug-07 21:16:32, Rolf Turner wrote:
> 
> I have a data matrix X (n x k, say) each row of which constitutes
> an observation of a k-dimensional random variable which I am willing,
> if not happy, to assume to be Gaussian, with mean ``mu'' and
> covariance matrix ``Sigma''.  Distinct rows of X may be assumed to
> correspond to independent realizations of this random variable.
> 
> Most rows of X (all but 240 out of 6000+ rows) contain one or more  
> missing values.
> [...]

One question, Rolf: How big is k (no of columns)?

If it's greater than 30, you may have problems with 'norm', since the
function prelim.norm() builds up its image of the places where there
are missing values as "packed integers" with code:

r <- 1 * is.na(x)

mdp <- as.integer((r %*% (2^((1:ncol(x)) - 1))) + 1)

i.e. 'x' would be nxk and have 1s where your X had missing, 0s elsewhere.
Then each row of 'x' is converted into a 32-bit integer whose "1" bits
correspond to the 1s in 'x'. You'll get "NA" warnings if k>30, and
things could go wrong!

In that case, I hope Chuck's suggestion works!

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Aug-07   Time: 00:10:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Covariance of data with missing values.

2007-08-15 Thread Ted Harding
Hi Rolf!

Have a look at the 'norm' package.

This does just what you;re asking for (assuming multivariate
normal, and allowing CAR missingness -- i.e. probability of
missing may depend on observed values, but must not depend on
unobserved).

Read the documentation for the various function *very* carefully!
Drop me a line if you want more info.

Best wishes,
Ted.

On 15-Aug-07 21:16:32, Rolf Turner wrote:
> 
> I have a data matrix X (n x k, say) each row of which constitutes
> an observation of a k-dimensional random variable which I am willing,
> if not happy, to assume to be Gaussian, with mean ``mu'' and
> covariance matrix ``Sigma''. Distinct rows of X may be assumed to
> correspond to independent realizations of this random variable.
> 
> Most rows of X (all but 240 out of 6000+ rows) contain one or more  
> missing values. If I am willing to assume that missing entries are
> missing completely at random (MCAR) then I can estimate the covariance
> matrix Sigma via maximum likelihood, by employing the EM algorithm. 
> Or so I believe.
> 
> Has this procedure been implemented in R in an accessible form?
> I've had a bit of a scrounge through the searching facilities,
> and have gone through the FAQ, and have found nothing that I could
> discern to be directly relevant.
> 
> Thanks for any pointers that anyone may be able to give.
> 
>   cheers,
> 
>           Rolf Turner


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 23:17:48
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Ted Harding
On 15-Aug-07 18:17:27, Greg Snow wrote:
> 
> 
> 
> Ted Harding wrote:
> [snip]
> 
>> So you if you want the density plot, you would need to calculate
>> this for yourself. E.g.
>> 
>> H1$density <- counts/sum(counts)
>> plot.histogram(H1,freq=FALSE)
> 
> shouldn't that be:
> H1$density <- counts/sum(counts)/diff(brkpts)
> 
> ?

Of course! Thank you.
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 20:23:13
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Negative Binomial: glm.nb

2007-08-15 Thread Ted Harding
Hi Folks,

I'm playing with glm.nb() in MASS.

Reference: the negative binomial distribution

  P(y) = (Gamma(theta+y)/(Gamma(theta)*y!))*(p^theta)*(1-p)^y

  y = 0,1,2,...

in the notation of the MASS book (section 7.4), where

  p = theta/(mu + theta)  so  (1-p) = mu/(mu + theta)

where mu is the expected value of Y. It seems from ?glm.nb that
an initial value of theta is either supplied, or obtained by
a method of moments after an initial Poisson fit to the data;
then it casts back and forth:

A: given theta, glm is applied to fit the linear model, with
log link, for mu. Then theta is re-estimed, then the linear model,
and so on.

I hope I've understood right!

However, this procedure if based on the assumption that the same
value of theta applies across the board, regardless of the values
of any covariates or factors.

The MASS book (Section 7.4) illustrates the use of glm.nb() on
the Quine data (for numbers of days absence from school in a
school year by Australian children). There are four factors in
addition to the  "outcome" Days:

Eth: Ethnicity (Aboriginal "A"/Non-Aboriginal "N")
Lrn: Learning Ability (Slow learner"SL", Average Learner "AL")
Age: Age Group (Primary "F0", First/Second/Third Forms "F1/2/3")
Sex: Gender ("M"/"F")

This dataset is earlier analysed thoroughly in the MASS book
(Section 6.8), primarily with reference to transforming Days
by a Box-Cox transformation. The negative binomial does not
play a role in that part.

When it comes to the glm.nb(), however, the assumption of "constant
theta" is implicit in Section 7.4.

But, if you look at the data, this does seem all that reasonable.

Preamble: Get the variables out of 'quine' into convenient names.

library(MASS)
D<-quine$Days ; E<-quine$Eth; L<-quine$Lrn; A<-quine$Age; S<-quine$Sex

D.A<-D[E=="A"]; L.A<-L[E=="A"]; A.A<-A[E=="A"]; S.A<-S[E=="A"]
D.N<-D[E=="N"]; L.N<-L[E=="N"]; A.N<-A[E=="N"]; S.N<-S[E=="N"]

Now, if you look at the histograms of D.A and D.N separately:

  hist(D.A,breaks=-0.5+4*(0:22))
  hist(D.N,breaks=-0.5+4*(0:22))

you can see that there is a major difference in shape, such as
would arise if theta<1 for Eth="A", and theta>1 for Eth="B".
This is confirmed by a separate covariate-free fit of the mean
to each group, from which the fitted theta is returned:

summary(glm.nb(D.A~1))
-->   Theta:  1.499
  Std. Err.:  0.260

summary(glm.nb(D.A~1))
-->   Theta:  0.919
  Std. Err.:  0.159

(1.499-0.919)/sqrt(0.260^2 + 0.159^2)
##[1] 1.903113

1-pnorm(1.903113)
##[1] 0.0285129

so we have a 1-sided P-value of 0.029, a 2-sided of 0.057

That's a fairly strong indication that theta is not the same
for the two groups; and the shapes of the histograms show that
the difference has an important effect on the distribution.

Of course, this dataset is notoriously unbalanced, so maybe this
is a reflection of the mixtures of categories. A simple linear
model on each subgroup eases tha above situation a bit:

summary(glm.nb(D.A ~ S.A+A.A+L.A))
-->   Theta:  1.767
  Std. Err.:  0.318
summary(glm.nb(D.N ~ S.N+A.N+L.N))
  Theta:  1.107
  Std. Err.:  0.201
1-pnorm((1.767-1.107)/sqrt(0.318^2+0.201^2)) = 0.0397

and throwing everything into the pot eases it a bit more:

summary(glm.nb(D.A ~ S.A*A.A*L.A))
-->   Theta:  2.317
  Std. Err.:  0.438
summary(glm.nb(D.N ~ S.N*A.N*L.N))
-->   Theta:  1.581
  Std. Err.:  0.318
1-pnorm((2.317-1.581)/sqrt(0.438^2+0.318^2)) = 0.0870

But, while the theta-values have now moved well up into the ">1"
range (where they have less effect on the shape of the distribution),
nevertheless the differences of the "A" and "N" estimates have
not changed much: 0.580 --> 0.660 --> 0.736
even though their P-values have eased off somwehat (and, with 69
and 77 data for the "A" and "N" groups respectively, we don't have
huge power here, I think).

I've gone through the above in detail, since it's an illustration
which you cam all work through yourselves in R, to illustrate the
issue about variation of theta in a negative binomial model, when
you want to get at the effects of covariates on the outcome Y (the
counts of events). In real life I'm considering other data where
the "theta" effect is more marked, and probably more important.
Which finally leads on to my

QUERY: Is there anything implement{ed|able} in R which can address
the issue that theta may vary according to factors or covariates,
as well as the level of mu given theta?

Or is one reduced to running glm.nb() (or the like) on subsets?

Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Ted Harding
On 15-Aug-07 08:30:08, Nick Chorley wrote:
> Hi,
> 
> I have a large amount of data that I would like to create a
> histogram of and plot and do things with in R. It is pretty
> much impossible to read the data into R, so I have written a
> program to bin the data and now have a list of counts in each
> bin. Is it possible to somehow import this into R and use
> hist(), so I can, for instance, plot the probability density?
> I have looked at the help page for hist(), but couldn't find
> anything related to this there.
> 
> Regards,
> 
> Nicky Chorley

Presumably you now have (or can readily generate) files on your
system whose contents are (or are equivalent to) something like:

brkpts.dat
0.0
0.5
1.0

9.5
10.0

counts.dat
10
7
38

7
0

where there is one more line in brkpts.dat than in counts.dat

Now simply read both files into R, creating variables 'brkpts',
'counts'

Now create a histogram template (any silly old data will do):

H1 <- hist(c(1,2))

Next, attach your variables to it:

H1$breaks <- brkpts
H1$counts <- counts

and you have your histogram in R. Also, you can use the data
in the variables 'brkpts', 'counts' to feed into any other
procedure which can acept data in this form.

Example (simulating the above in R):

  brkpts<-0.5*(0:20)
  counts<-rpois(20,7.5)
  H1<-hist(c(1,2))
  H1$breaks <- brkpts
  H1$counts <- counts
  plot(H1)

Or, if you want a more "realistic-looking" one, follow on with:

midpts<-(brkpts[1:20]+brkpts[2:21])/2
counts<-rpois(20,100*dnorm(midpts,mean=5,sd=3))
H1$breaks <- brkpts
H1$counts <- counts
plot(H1)


In other words, you've already done R's work for it with your
program which bins the data. All you need to do in R is to get
these results into the right place in R.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 10:38:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to "import" histograms in R?

2007-08-15 Thread Ted Harding
On 15-Aug-07 10:15:13, Nick Chorley wrote:
>>[...]
>> Now create a histogram template (any silly old data will do):
>>
>> H1 <- hist(c(1,2))
>>
>> Next, attach your variables to it:
>>
>> H1$breaks <- brkpts
>> H1$counts <- counts
>>
>> and you have your histogram in R. Also, you can use the data
>> in the variables 'brkpts', 'counts' to feed into any other
>> procedure which can acept data in this form.
> 
> 
> This is precisely what I wanted to do, except I didn't realise
> that you could assign to the variables in the histogram object
> like this. 
> Normally when constructing the histogram, I'd use hist(x, prob=T)
> to plot the probability density against x, but obviously if you're
> just assigning values to the variables in the object, you can't do
> that. I tried putting "prob=T" in the call to hist when making the
> dummy object, but that didn't help.

Followup:

The histogram object I constructed in the example from my previous
reply contains the following:

H1
$breaks
 [1]  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0
[12]  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0

$counts
 [1]  3  6  3  7  7  8 19 10 16 12 16 12 13 12 11 13 11  7  3  6

$intensities
[1] 0.992 1.000

$density
[1] 0.992 1.000

$mids
[1] 1.25 1.75

$xname
[1] "c(1, 2)"

$equidist
[1] TRUE

All these things are calculated when hist() is called for
raw data.

The "$breaks" and "$counts" are what I assigned to it with

  H1$breaks <- brkptsH1$counts <- counts

"$intensities", "$density", "$mids", "$xname" and "$equidist"
are what was set up by the initial call

H1 <- hist(c(0,1))

Previously, I used "plot(H1)" to illustrate that the above
assignments put the breakpoints and the counts into the
histogram object. For a more thorough approach, you need to
use plot.histogram() instead.

Here you can, in fact, set the "prob=TRUE" condition in the
plot by setting "freq=FALSE" in t call to plot.histogram().

But then it will look at "$density" for the values to plot.

So you if you want the density plot, you would need to calculate
this for yourself. E.g.

H1$density <- counts/sum(counts)
plot.histogram(H1,freq=FALSE)

And so on ... there are many relevant details in the help pages
?hist and ?plot.histogram

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Aug-07   Time: 11:53:14
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear Regression with slope equals 0

2007-08-14 Thread Ted Harding
On 14-Aug-07 11:36:37, [EMAIL PROTECTED] wrote:
> 
>  Hi there, am trying to run a linear regression with a slope of 0.
> 
>  I have a dataset as follows
> 
>  t d
>  1 303
>  2 302
>  3 304
>  4 306
>  5 307
>  6 303
> 
>  I would like to test the significance that these points would lie on a
> horizontal straight line.
> 
>  The standard regression lm(d~t) doesn't seem to allow the slope to be
> set.

The model d~1 will fit a constant (the mean), i.e. a regressio with
slope = 0. The model d~t will fit the usual linear regression.
The two con be compared with anova(), as well as getting the details
of the individual fits with summary().

E.g. (with your example):

  d<-c(303,302,304,306,207,303)
  t<-c(1,2,3,4,5,6)
  lm0<-lm(u~1);lm1<-lm(u~t);anova(lm0,lm1)

##Analysis of Variance Table
##Model 1: u ~ 1
##Model 2: u ~ t
##  Res.DfRSS Df Sum of Sq  F Pr(>F)
##1  5 7785.5   
##2  4 6641.4  11144.1 0.6891 0.4531

summary(lm0)
## Call: lm(formula = u ~ 1)
## Residuals:
##1 2 3 4 5 6 
## 15.5  14.5  16.5  18.5 -80.5  15.5 
## Coefficients:
##Estimate Std. Error t value Pr(>|t|)
## (Intercept)   287.50  16.11   17.85 1.01e-05 ***
##Residual standard error: 39.46 on 5 degrees of freedom

mean(d)
## [1] 287.5

summary(lm1)
## Call: lm(formula = u ~ t)
## Residuals:
##  1   2   3   4   5   6 
## -4.714   2.371  12.457  22.543 -68.371  35.714 
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  315.800 37.934   8.325  0.00114 **
## t -8.086  9.740  -0.830  0.45314   
## Residual standard error: 40.75 on 4 degrees of freedom
## Multiple R-Squared: 0.147,  Adjusted R-squared: -0.0663 
## F-statistic: 0.6891 on 1 and 4 DF,  p-value: 0.4531 

The P-value for the slope in lm1 is the same as the P-value
returned by anova().

If you want to force a particular non-zero slope (e.g. s0)
for comparison with the data, you can use

  lm0 <- lm(d - s0*t ~ 1),

compared with

  lm1<- lm(d- s0*t ~ t)

for instance.

Hoping this helps,
Ted.

----
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 14-Aug-07   Time: 13:16:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help wit matrices

2007-08-10 Thread Ted Harding

On 10-Aug-07 18:05:50, Lanre Okusanya wrote:
> Hello all,
> 
> I am working with a 1000x1000 matrix, and I would like to return a
> 1000x1000 matrix that tells me which value in the matrix is greater
> than a theshold value (1 or 0 indicator).
> i have tried
>   mat2<-as.matrix(as.numeric(mat1>0.25))
> but that returns a 1:10 matrix.
> I have also tried for loops, but they are grossly inefficient.
> 
> THanks for all your help in advance.
> 
> Lanre

Simple-minded, but:

> S<-matrix(rnorm(25),nrow=5)
> S
   [,1][,2]   [,3]   [,4]   [,5]
[1,] -0.9283624 -0.44418487  1.1174555  1.9040999 -0.4675796
[2,]  0.2658770 -0.28492642 -1.2271013 -0.5713291  1.8036235
[3,]  0.7010885 -0.42972262  0.7576021  0.3407972 -1.0628487
[4,] -0.2003087  0.87006841  0.6233792 -0.9974902 -0.9104270
[5,]  0.2729014  0.09781886 -1.0004486  1.5987385 -0.4747125
> T<-0*S
> T[S>0.25] <- 1+0*S[S>0.25]
> T
 [,1] [,2] [,3] [,4] [,5]
[1,]00110
[2,]10001
[3,]10110
[4,]01100
[5,]10010

Does this work OK for your big matrix?

HTH
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Aug-07   Time: 19:50:37
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subject: Re: how to include bar values in a barplot?

2007-08-09 Thread Ted . Harding
ful" means "thinking about it" -- one
thing that spreadsheets inhibit, because

a) Even if you do think about it, you're not going to find it easy
   to implement the results of your thoughts (if they're any good);
b) Spreadsheets readily induce the naive (especially beginning)
   user into the habit of trusting that the writers of spreadsheet
   software have thought through all those nasty implementation
   technicalities and have created an "expert system" which looks
   after drawing the graph according to best practice and with all
   necessary sophistication. Look! Isn't it clever!!

This habit, once (all too easily) acquired, is difficult to kick.
Patrick Burns's deliberate use of "addiction" is apt.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Aug-07   Time: 22:40:19
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Test (please ignore)

2007-08-08 Thread Ted . Harding
Please excuse this -- I need to test whether I can get through to R-help!
(Have failed repeatedly today).
Ted.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interaction factor and numeric variable versus separate

2007-08-07 Thread Ted Harding
On 07-Aug-07 15:34:13, Gabor Grothendieck wrote:
> In the single model all three levels share the same intercept which
> means that the slope must change to accomodate it
> whereas in the three separate models they each have their own
> intercept.

I think this arose because of the formulation of the "model with
interaction" as:

  summary(lm(y~x:f, data=d))

If it has been formulated as

  summary(lm(y~x*f, data=d))

there would be three separate intercepts, and three different slopes
(and the differences would be the same as the differences for the
separate models).

Ted.

> Try looking at it graphically and note how the black dotted lines
> are all forced to go through the same intercept, i.e. the same point
> on the y axis, whereas the red dashed lines are each able to
> fit their portion of the data using both the intercept and the slope.
> 
> y.lm <- lm(y~x:f, data=d)
> plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
> for(i in 1:3) {
>   abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
>   abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty =
> "dashed")
> }
> grid()
> 
> 
> On 8/7/07, Sven Garbade <[EMAIL PROTECTED]> wrote:
>> Dear list members,
>>
>> I have problems to interpret the coefficients from a lm model
>> involving
>> the interaction of a numeric and factor variable compared to separate
>> lm
>> models for each level of the factor variable.
>>
>> ## data:
>> y1 <- rnorm(20) + 6.8
>> y2 <- rnorm(20) + (1:20*1.7 + 1)
>> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
>> y <- c(y1,y2,y3)
>> x <- rep(1:20,3)
>> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
>> d <- data.frame(x=x,y=y, f=f)
>>
>> ## plot
>> # xyplot(y~x|f)
>>
>> ## lm model with interaction
>> summary(lm(y~x:f, data=d))
>>
>> Call:
>> lm(formula = y ~ x:f, data = d)
>>
>> Residuals:
>>Min  1Q  Median  3Q Max
>> -2.8109 -0.8302  0.2542  0.6737  3.5383
>>
>> Coefficients:
>>Estimate Std. Error t value Pr(>|t|)
>> (Intercept)  3.687990.41045   8.985 1.91e-12 ***
>> x:flev1  0.208850.04145   5.039 5.21e-06 ***
>> x:flev2  1.496700.04145  36.109  < 2e-16 ***
>> x:flev3  6.708150.04145 161.838  < 2e-16 ***
>> ---
>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> Residual standard error: 1.53 on 56 degrees of freedom
>> Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984
>> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
>>
>> ## separate lm fits
>> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
>> $lev1
>> (Intercept)   x
>>  6.77022860 -0.01667528
>>
>> $lev2
>> (Intercept)   x
>>   1.0190781.691982
>>
>> $lev3
>> (Intercept)   x
>>   3.2746566.738396
>>
>>
>> Can anybody give me a hint why the coefficients for the slopes
>> (especially for lev1) are so different and how the coefficients from
>> the
>> lm model with interaction are related to the separate fits?
>>
>> Thanks, Sven
>>
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 17:01:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and excell differences in calculation F distributionfu

2007-08-07 Thread Ted Harding
On 07-Aug-07 10:27:06, Luis Ridao Cruz wrote:
> R-help,
> 
> I'm trying to work out the density function by doing:
> 
>> df(5.22245, 3, 22)
> [1] 0.005896862
> 
>> df(15.20675, 6, 4)
> [1] 0.001223825
> 
> 
> In excell the result is :
> 
> 0.0071060464 <*>   FDIST(5.22245,3,22) 
> 
> 0.011406 <-->   FDIST(15.20675,6,4)
> 
> 
>>From my point of view the differences in the second case
> are substantial.
> 
> Can anyone give me a hint on this?
> 
> Thanks in advance

It would seem that Excel is giving you the upper tail  of the
F-distribution (cumulative distribution, not density), i.e.
what R would calculate as

pf(5.22245,3,22,lower.tail=FALSE)
[1] 0.007106046

pf(15.20675,6,4,lower.tail=FALSE)
[1] 0.0114

so in effect using "pf" rather than "df". If you want the
density function (corresponding to "df") from Excel, then
that's not something I know how to do (nor want to know ... ).

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 11:47:52
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About grep

2007-08-06 Thread Ted Harding
On 07-Aug-07 04:20:27, Shao wrote:
> Hi,everyone.
> 
> I have a problem when using the grep.
> for example:
> a <- c("aa","aba","abac")
> b<- c("ab","aba")
> 
> I want to match the whole word,so
> grep("^aba$",a)
> it returns 2
> 
> but when I used it a more useful way:
> grep("^b[2]$",a),
> it doesn't work at all, it can't find it, returning integer(0).
> 
> How can I chang the format in the second way?
> 
> Thanks.
> 
> -- 
> Shao

The problem is that in the string "^b[2]$" the element b[2] of b
is not evaluated, but simply the successive characters ^ b [ 2 ] $
are passed to grep as a character string, which of course is not
found. You can construct a character string with the value of b[2]
in it by using paste():

grep(paste("^",b[2],"$",sep=""),a)
[1] 2

Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 07:43:14
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data analysis

2007-08-06 Thread Ted Harding
On 06-Aug-07 19:26:59, lamack lamack wrote:
> Dear all, I have a factorial design where the
> response is an ordered categorical response.
> 
> treatment (two levels: 1 and 2)
> time (four levels: 30, 60,90 and 120)
> ordered response (0,1,2,3)
> 
> could someone suggest a correct analysis or some references?

For your data below, I would be inclined to start from here,
which gives the counts for the different responses:


   Response
 
Trt Time   0123
++
Tr1  30 |   1 3  |  4
 60 |   211  |  4
 90 |   31   |  4
120 |   31   |  4
++---
Tr2  30 |   2 2  |  4 
 60 |   31   |  4
 90 |   3 1  |  4
120 |  121   |  4
=
Tr1 |  0934  | 16
++---
Tr2 |  1   1023  | 16
=

This suggests that, if anything is happening there at all,
it is a tendency for high response to occur at shorter times,
and low response at longer times, with little if any difference
between the treatments.

To approach this formally, I would consider adopting a
"re-randomisation" approach, re-allocating the outcomes at
random in such a way as to preserve the marginal totals,
and evaluating a statistic T, defined in terms of the counts
and such as to be sensitive to the kind of effect you seek.

Then situate the value of T obtained from the above counts
within the distribution of T obtained by this re-randomisation.

There must be, somewhere in R, routines which can perform this
kind of constrained re-randomisation,but I'm not sufficiently
familiar with that area of R to know for sure about them.

I hope other readers who know about this area in R can come
up with suggestions!

best wishes,
Ted.

> subject treatment  time   response
> 1   130   3
> 2   130   3
> 3   130   1
> 4   130   3
> 5   160   3
> 6   160   1
> 7   160   1
> 8   160   2
> 9   190   2
> 10  190   1
> 11  190   1
> 12  190   1
> 13  1   120   2
> 14  1   120   1
> 15  1   120   1
> 16  1   120   1
> 17  230   3
> 18  230   3
> 19  230   1
> 20  230   1
> 21  260   1
> 22  260   2
> 23  260   1
> 24  260   1
> 25  290   1
> 26  290   1
> 27  290   1
> 28  290   3
> 29  2   120   1
> 30  2   120   2
> 31  2   120   0
> 32  2   120   1
> 
> _
> Verificador de Segurança do Windows Live OneCare: verifique já a
> segurança 
> do seu PC! http://onecare.live.com/site/pt-br/default.htm
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Aug-07   Time: 00:30:19
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] test the significances of two regression lines

2007-08-06 Thread Ted Harding
On 06-Aug-07 10:32:50, Luis Ridao Cruz wrote:
> R-help,
> 
> I'm trying to test the significance of two regression lines
> , i.e. the significance of the slopes from two samples
> originated from the same population.
> 
> Is it correct if I fit a liner model for each sample and
> then test the slope signicance with 'anova'. Something like this:
> 
> lm1 <- lm(Y~ a1 + b1*X)# sample 1
> lm2 <- lm(Y~ a2 + b2*X)# sample 2
> 
> anova(lm1, lm2)

No, this will not work. From "?anova":

Warning:
  The comparison between two or more models will only be valid if
  they are fitted to the same dataset.

which is not the case in your example. One way to proceed is to
merge the two datasets, and introduve a factor which identifies
the dataset. For example:

  x1<-rnorm(100) ; x2<-rnorm(100)
  y1 <- 0.2 + 0.1*x1 + 0.05*rnorm(100)
  y2 <- 0.2 + 0.12*x2 + 0.05*rnorm(100)
  x <- c(x1,x2)
  y <- c(y1,y2)
  S <- factor(c(rep(0,100),rep(1,100)))
  lm12 <- lm(y ~ x*S)

First look at the fit of y1~x1:
  summary(lm(y1~x1))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.206042   0.004647   44.34   <2e-16 ***
x1  0.0913820.091382   0.004768   19.16   <2e-16 ***

Then the fit of y2~x2:
  summary(lm(y2~x2))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.208216   0.005171   40.26   <2e-16 ***
x2  0.118840   0.005009   23.73   <2e-16 ***

so the estimated slopes idiffere by 0.118840 - 0.091382 = 0.027458
But what is the "significance" of this difference?

Now:
  summary(lm12)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.206042   0.004923  41.852  < 2e-16 ***
x   0.091382   0.005052  18.088  < 2e-16 ***
S1  0.002174   0.006953   0.313 0.754926
x:S10.027457   0.006939   3.957 0.000106 ***

so the "x:S1" value is the same as the difference in slopes
as estimated from lm1 and lm2; but now we have a standard error
and a P-value for it. You can also use anova now:

  anova(lm12)
Response: y
   Df  Sum Sq Mean Sq  F valuePr(>F)
x   1 2.26537 2.26537 946.2702 < 2.2e-16 ***
S   1 0.00015 0.00015   0.0614 0.8045253
x:S 1 0.03749 0.03749  15.6599 0.0001060 ***
Residuals 196 0.46922 0.00239   

so you get the same P-value, though with anova() you do not see
the actual estimate of the difference between the slopes.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Aug-07   Time: 12:16:06
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Any "special interest" in R/pic interface?

2007-08-05 Thread Ted Harding
a) 'pic' code to specify the basic primitives of the plot
   (analagous to the "define bar{...}" in my histogram example)
b) Layout information
c) Acess to the numerical (and any textual) information which
   determines the position and value of plotted objects
c) Export to a file which can then be readily edited to "tune" the
   appearance of the result to the user's satisfaction (as in the
   choice of size and style for the printed histogram counts).

I'm well aware of R's xfig() device, which exports to XFig files
which the xfig program can in turn export to 'pic' language.

However, the resulting 'pic' code is horrendous, and essentially
impossible to edit for a diagram of any complexity (it's near
impossible for a mere scatter-plot of 2 variables). The 'pic'
code written by a human is much more transparent and easy to
edit (see my histogram example), and is the kind of objective
I have in mind. I envisage that in complex plots produced by R,
such as the above xyplot, one can obtain the generic information
on needs by

XY<-xyplot()
str(XY)
then extracting XY$whatever, and wrapping this stuff in 'pic' code.


Looking forward to hearing if anyone is interested in joining in
an exploration of this territory!

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Aug-07   Time: 13:01:40
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Invert Likert-Scale Values

2007-08-04 Thread Ted Harding
On 04-Aug-07 22:02:33, William Revelle wrote:
> Alexis and John,
> 
> To reverse a Likert like item, subtract the item from the maximum 
> acceptable value + the minimum acceptable value,
> That is, if
> x <- 1:8
> xreverse <- 9-x
> 
> Bill

A few of us have suggested this, but Alexis's welcome for the
recode() suggestion indicates that by the time he gets round to
this his Likert scale values have already become levels of a factor.

Levels "1", "2", ... of a factor may look like integers, but they're
not; and R will not let you do arithmetic on them:

> x<-factor(c(1,1,1,2,2,2))
> x
[1] 1 1 1 2 2 2
Levels: 1 2
> y<-(3-x)
Warning message: 
"-" not meaningful for factors in: Ops.factor(3, x) 
> y
[1] NA NA NA NA NA NA

However, you can turn them back into integers, reverse, and then
turn the results back into a factor:

> y <- factor(3 - as.integer(x))
> y
[1] 2 2 2 1 1 1
Levels: 1 2

So, even for factors, the insight undelying our suggestion of "-"
is still valid! :)

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 05-Aug-07   Time: 00:09:58
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] - round() strange behaviour

2007-08-04 Thread Ted Harding
On 04-Aug-07 21:57:28, John Logsdon wrote:
> I must admit I had never realised this so thanks to Monica for
> raising it.  
> 
> Round-to-even is used for statistical reasons and there is some
> point as it obviously reduces bias. But why should the decision
> be to round to the nearest even number?  rounding to the nearest
> odd number would be equally valid.  (signif(x,0) is the same as
> round(). )

A fair point! But see below.

> There is a debate and sometimes you need symmetric rounding.
> Perhaps there should be an option in round that defaults to
> round-to-even for compatibility but includes the various other
> rounding approaches as seen for example in 
> http://en.wikipedia.org/wiki/Rounding

As wikipedia says: "Round-to-even is used rather than round-to-odd
as the latter rule would prevent rounding to a result of zero."

And there's also stochastic rounding -- toss a penny. But you wouldn't
get the same answer next time.

And I can recall -- man years ago -- being in an environment where
the practice was to round alternately up and down. (These were
engineers).

John's comparisons between FORTRAN, octave and (implicitly) R
are interesting.

As a general reflection: any method of rounding involves throwing
away some of the information in the data, and this will have
consequences. So the question is: what consequences are you
happiest with?

One consequence of rounding to nearest even (or nearest odd) is
that this can give the worst possible outcome in terms of

(rounded X - rounded Y) compared with (unrounded X - unrounded Y).

For example, rounding to even integer X =1.5 --> 2.0 and
Y = 0.5 --> 0.0 gives

  X - Y = 1.0 , rounded X - rounded Y = 2.0

whereas the difference is 1.0 if you always round up, or always down,
and this is also exactly the difference between the unrounded values..

Rounding to nearest odd would give X = 1.5 --> 1.0, Y = 0.5 --> 1.0
thus difference 0.0 in this case, but again difference 2.0 for
X = 2.5 --> 3.0, Y = 1.5 --> 1.0

And alternate rounding would give either 2.0 or 0.0 depending
on the phase of X and Y.

Thus "always up" or "always down" means that the difference between
rounded numbers is always the same as between the unrounded numbers
which for other methods the differences can differ by 2.0.
This may or may not matter, but it is something to think about
when choosing a rounding method.

> [...]

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Aug-07   Time: 23:53:50
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Invert Likert-Scale Values

2007-08-04 Thread Ted Harding
On 04-Aug-07 16:42:23, John Kane wrote:
> Will ?recode in the car package do what you want?
>  x <- 1:4
>  recode(x, "1='4';2='3' ;3='2'; 4='1'")

Is thre a problem with just using

  New <- (8 - Old)

??

Ted.

> --- Alexis Delevett <[EMAIL PROTECTED]> wrote:
> 
>> Hi!
>> 
>> I am using R to process some community survey data.
>> Several item responses are recorded via a 7-point
>> Likert-Scale. As I have coded the responses, 1
>> represents high agreement, and 7 high disagreement.
>> This of course impacts the coefficients in a linear
>> regression (of example agreement to self-perception
>> measures on housing satisfaction). For some
>> purposes, in order to make the coefficients more
>> accessible to the reader, I would like to invert the
>> item values, i.e. to arrive at 1 for high
>> disagreement, and 7 for high agreement (such that
>> the linear regression would express something like
>> "the higher the agreement on A, the greater the B).
>> 
>> Is there an already existing function for this, or
>> do I use a custom replace loop in R?
>> 
>> Thank you, Alexis
>> 
>>
>> -
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained,
>> reproducible code.
>>
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Aug-07   Time: 18:06:38
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about logistic models (AIC)

2007-08-03 Thread Ted Harding
On 03-Aug-07 16:42:33, Kyle. wrote:
> Tom:
> 
> That's a good question.  AIC, as i'm sure you know, is usually  
> calculated as 2k-2ln(L), where k is the number of free parameters in  
> the model, and L is the log-likelihood of the model.  The goal of  
> AIC--and the reason one normally tries to select a model with minimal  
> AIC--is to minimize what's referred to as the Kullback-Leibler  
> distance between the distribution of your data's density from the  
> theoretical "true" theoretical density as defined by the model.  More  
> concisely, the AIC is an index of the amount of information regarding  
> your data that is lost when your model is used to describe it.  To  
> get back to your question, I can't say without a little more  
> information why the AIC's your referring to are negative (but perhaps  
> it's an issue of using the log-likelihood instead of the negative log- 
> likelihood), but my first instinct is that it doesn't matter.  I  
> would go with the AIC that is closest to zero.  I hope that helps.

That could be wrong! Don't forget that ln(L) is indeterminate to
within an additive constant (for given data), so one man's negative
AIC could be another mans positive AIC -- but if each compared
their AICs with different models (the same different models for each)
then they should get the same *difference* of AIC.

The correct approach is to compare AICs on the basis of algebraic
difference: AIC1 - AIC2 in comparing Model1 with Model2.

If this is positive then Model2 is better than Model1 (on the AIC
criterion). Taking "the AIC that is closest to zero" would be the
wrong way round for negative AICs.

Ted.


> On Aug 3, 2007, at 8:41 AM, Tom Willems wrote:
> 
>> Dear fellow R-ussers,
>> I have a question about the Akaike Information Criterion  in the  R
>> output.
>> Normally you would want it to be as small as possible, yet when i  
>> check up the information in my study books, the AIC is usually
>> displayed as a negative number. In the exercises it is given as
>> " - AIC ".
>> R displays it as a positive number, does this mean that a large "AIC"
>> gives a small " - AIC ", so the bigger the better?
>>
>> Kind regards,
>> Tom.
>>
>> Tom Willems
>> CODA-CERVA-VAR
>> Department of Virology
>> Epizootic Diseases Section
>> Groeselenberg 99
>> B-1180 Ukkel
>>
>> Tel.: +32 2 3790522
>> Fax: +32 2 3790666
>> E-mail: [EMAIL PROTECTED]
>>
>> www.var.fgov.be


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Aug-07   Time: 18:31:51
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with reading data files with different numbers o

2007-08-02 Thread Ted Harding
On 02-Aug-07 21:14:20, Tom Cohen wrote:
> Dear  List, 
>
>   I have 30 data files with different numbers of lines (31 and 33) that
> I want to skip before reading the files. If I use the skip option I can
> only choose either to skip 31 or 33 lines. The data files with 31 lines
> have no blank rows between the lines and the header row. How can I read
> the files without manually checking which files have 31 respectively 33
> lines ?  The only text line I want to keep is the header.
>
>   Thamks for your help,
>   Tom
>
>
>   for (i in 1:num.files) {
>a<-read.table(file=data[i],
>   ,header=T,skip=31,sep='\t',na.strings="NA") 
>   
>   }

Apologies, I misunderstood your description in my previous response
(I thought that the total number of lines in one of your files was
either 31 or 33, and you wanted to know which was which).

I now think you mean that there are either 0 (you want to skip 31)
or 2 (you want to skip 33) blank lines in the first 33, and then you
want the remainder (aswell as the header). Though it's still not
really clear ...

You can find out how many blank lines there are in the first 33 with

> sum(cbind(readLines("~/00_junk/temp.tr", 33))=="")

and then choose how many lines to skip.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Aug-07   Time: 00:11:21
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with reading data files with different numbers o

2007-08-02 Thread Ted Harding
On 02-Aug-07 21:14:20, Tom Cohen wrote:
> Dear  List, 
>
>   I have 30 data files with different numbers of lines (31 and 33) that
> I want to skip before reading the files. If I use the skip option I can
> only choose either to skip 31 or 33 lines. The data files with 31 lines
> have no blank rows between the lines and the header row. How can I read
> the files without manually checking which files have 31 respectively 33
> lines ?  The only text line I want to keep is the header.
>
>   Thamks for your help,
>   Tom
>
>
>   for (i in 1:num.files) {
>a<-read.table(file=data[i],
>   ,header=T,skip=31,sep='\t',na.strings="NA") 
>   
>   }

If you're using a Unix/Linux system, you have the little command "wc"
which can count characters or words or lines in a file. For example,
I'm working at the moment on a file "mkSim.R" which has 53 lines.

So, in R:

> system("wc -l mkSim.R")
 53 mkSim.R

Hence the following returns the line-count as an integer:

> as.integer(substring(system("wc -l mkSim.R",intern=TRUE),1,7))
[1] 53

(which will also work for files with hundreds, thousands, ... of
lines, since the units digit is at position 7).

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Aug-07   Time: 23:51:59
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-distribution

2007-08-01 Thread Ted Harding
On 01-Aug-07 19:18:05, [EMAIL PROTECTED] wrote:
> Well, is "t = 1.11" all that accurate in the first place?  :-)
> 
> In fact, reading beween the lines of the original enquiry, what the
> person probably wanted was something like
> 
> ta <- pt(-1.11, 9) + pt(1.11, 9, lower.tail = FALSE)
> 
> which is the two-sided t-test tail area.
> 
> The teller of the parable will usually leave some things unexplained...
> 
> Bill. 

However: "Those who have ears to hear, let them hear!"
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Aug-07   Time: 20:43:53
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write.csv

2007-07-30 Thread Ted Harding
On 29-Jul-07 17:41:58, Dong GUO ¹ù¶« wrote:
> Hi,
> 
> I want to save an array(say, array[6,7,8]) write a cvs file.
> How can I do that??? can I write in one file?
> 
> if I could not write in one file, i want to use a loop to save
> in different files (in the array[6,7,8], should be 8 csv files),
> such as the filename structure should be:
> file ="filename" +str(i)+"." +"csv"
> 
> Many thanks.

The following (illustrated on a smaller array) may help:

A<-array((1:60),dim=c(3,4,5))
D<-dim(A)

write.table(t(dim(A)),file="A.csv",sep=",",
quote=FALSE,row.names=FALSE,col.names=FALSE)

Z<-as.vector(A)
write.table(t(Z),file="A.csv",append=TRUE,sep=",",
quote=FALSE,row.names=FALSE,col.names=FALSE)

Then the file A.csv contains two rows:

3,4,5
1,2,3,4,5,6,7,8,9,10,11,12, ... ,55,56,57,58,59,60

of which the first gives the dimension, the second the
data for the array.

You can then reconstruct another array, say B, as:

dimB<-scan(file="A.csv",sep=",",nlines=1)
dataB<-scan(file="A.csv",sep=",",skip=1,nlines=1)
B<-array(dataB,dimB)

That's a hard way to do things, perhaps, but since you said
you wanted the array as a CSV file, this is one way to do that.

Since a CSV text file is essentially a two-dimensional object
(fields in rows, by rows), to store a higher-dimensional object
in such a format you have to include the "meta-data" about the
structure -- in this case the list of dimensions.

Note, however, that, although it is a CSV file,

read.csv("A.csv",header=FALSE)

will not work nicely, since it will give you two rows of equal
length, the first one padded out with (in this case 57) NAs,
which you will then have to clean up; which you can do, but by
the time you've done it you might as well have done it the above
way!

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jul-07   Time: 22:24:12
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating symmetric matrices

2007-07-30 Thread Ted Harding
On 28-Jul-07 03:28:25, Gregory Gentlemen wrote:
> Greetings,
> 
> I have a seemingly simple task which I have not been able to solve
> today. I want to construct a symmetric matrix of arbtriray size w/o
> using loops. The following I thought would do it:
> 
> p <- 6
> Rmat <- diag(p)
> dat.cor <- rnorm(p*(p-1)/2)
> Rmat[outer(1:p, 1:p, "<")] <- Rmat[outer(1:p, 1:p, ">")] <- dat.cor
> 
> However, the problem is that the matrix is filled by column and so the
> resulting matrix is not symmetric.
> 
> I'd be grateful for any adive and/or solutions.
> 
> Gregory 

Would the fact that A + t(A) is symmetric be useful here?

E.g.

p <- 6
A <- matrix(rnorm(p^2),ncol=p)
A <- (A + t(A))/sqrt(2)
diag(A) <- rep(1,p)
round(A,digits=2)
  [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
[1,]  1.00  0.53 -0.20  1.27  0.34  0.83
[2,]  0.53  1.00 -0.99 -0.72  0.68 -1.21
[3,] -0.20 -0.99  1.00 -0.62 -0.36 -0.87
[4,]  1.27 -0.72 -0.62  1.00  2.40  0.33
[5,]  0.34  0.68 -0.36  2.40  1.00  0.20
[6,]  0.83 -1.21 -0.87  0.33  0.20  1.00

(Here, because each off-diagonal element of A is the sum of
2 independent N(0,1)s, divided by sqrt(2), the result is
also N(0,1)).

However, whether this is reallyu seful for you depends on
what you want the elements of A to be!

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jul-07   Time: 20:00:55
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2nd R Console

2007-07-30 Thread Ted Harding
On 28-Jul-07 08:19:10, Michael Janis wrote:
> Hi,
> 
> I was reading a thread: [R] "2nd R console" and had a similar
> question regarding having more than one R console open at a time. 
> However, my question differs from that of the thread:
> 
> Is it possible, or is there a wrapper that will allow one, to open
> an arbitrary number of R consoles which access the same R session
> (all objects in that session, etc.).  This would be R on linux
> accessed through a shell - kind of like using GNU screen multi-user
> such that people could work collaboratively on a given session.
> The problem with screen is that all commands are interleaved in
> the same terminal, which is confusing and does not allow access
> to the command prompt at the same time, rather it would be sequential.
> I know there will be "why" questions but it is useful in an
> academic environment.  Basically we have a memory machine for large
> genomic analysis - and we could set that up as an Rserver, but this
> placing R into a multi-user engine is better suited for our immediate
> needs.  Does anybody have thoughts on this?

You could have a look at XMX (X-protocol Multiplexer). This is now
rather old, and I think it has not been further developed for many
years.

http://www.cs.brown.edu/software/xmx/README.patch7

Basically, you would start XMX from one machine (A) networked
to others (B,C,...), all running X-windows. in the startup, you
designate which machines are to share the session, and what rights
they will have.

On machine A, and all designated machines B, C, ... , appears
a window. This in fact acts like a screen emulator, with its
own X session inside it. The A user then starts up a program
(say R) in this window, and what A sees is mirrored on the other
machines.

If A has granted input privileges to the other machines, then
users of B, C, ... can do things themselves, and the effects of
their actions are mirrored to all the other machines.

Thus, for instance, it would be possible for different users to
take turns at entering commands into the R session, and so on,
from their own machines. This is a much better way of arranging
things, than having all the people queue up to take turns at
sitting in the one and only chair!

It is certainly useful in a classroom situation, and even in
a one-to-one tutorial. For instance, I've used it in the past
to teach programming and system management, sitting more or
less beside the other person but at different machines. We
would both, for instance, be editing the same program file,
and either could initiate a program run with the current file.
(Of course, what's technically called a "race condition" can
develop here ... ). I've even envisaged the scenario where two
people, one in England and one in the US, could simultaneously
be editing a joint paper (you'd also need an independent
communication channel as well, but that's no problem using a
"chat" program).

The one constraint with XMX (at least in the past, I'm not
sure to what extent it may have been relaxed) which can limit
its use is that it depends on fairly close compatibility
between the X resources of all the different machines. It's
best of they're identical (same screen resolution e.g. 1024x768,
same colour depths on all screens, ... ). Otherwise it's liable
to not establish the necessary connections, and only some
machines can join in.

However, in a computing lab environment, it may well be that
all machines are compatible.

Suck it and see!

Hoping this is useful,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jul-07   Time: 19:27:48
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the large dataset problem

2007-07-30 Thread Ted Harding
On 30-Jul-07 11:40:47, Eric Doviak wrote:
> [...]

Sympathies for the constraints you are operating in!

> The "Introduction to R" manual suggests modifying input files with
> Perl. Any tips on how to get started? Would Perl Data Language (PDL) be
> a good choice?  http://pdl.perl.org/index_en.html

I've not used SIPP files, but itseems that they are available in
"delimited" format, including CSV.

For extracting a subset of fields (especially when large datasets may
stretch RAM resources) I would use awk rather than perl, since it
is a much lighter program, transparent to code for, efficient, and
it will do that job.

On a Linux/Unix system (see below), say I wanted to extract fields
1, 1000, 1275,  , 5678 from a CSV file. Then the 'awk' line
that would do it would look like

awk '
 BEGIN{FS=","}{print $(1) "," $(1000) "," $(1275) "," ... $(5678)
' < sippfile.csv > newdata.csv

Awk reads one line at a tine, and does with it what you tell it to do.
It will not be overcome by a file with an enormous number of lines.
Perl would be similar. So long as one line fits comfortably into RAM,
you would not be limited by file size (unless you're running out
of disk space), and operation will be quick, even for very long
lines (as an experiment, I just set up a file with 10,000 fields
and 35 lines; awk output 6 selected fields from all 35 lines in
about 1 second, on the 366MHz 128MB RAM machine I'm on at the
moment. After transferring it to a 733MHz 512MB RAM machine, it was
too quick to estimate; so I duplicated the lines to get a 363-line
file, and now got those same fields out in a bit less than 1 second.
So that's over 300 lines/second, 200,000 lines a minute, a million
lines in 5 minutes; and all on rather puny hardware.).

In practice, you might want to write a separate script which woould
automatically create the necessary awk script (say if you supply
the filed names, haing already coded the filed positions corresponding
to filed names). You could exploit R's system() command to run the
scripts from within R, and then load in the filtered data.

> I wrote a script which loads large datasets a few lines at a time,
> writes the dozen or so variables of interest to a CSV file, removes
> the loaded data and then (via a "for" loop) loads the next few lines
>  I managed to get it to work with one of the SIPP core files,
> but it's SLW.

See above ...

> Worse, if I discover later that I omitted a relevant variable,
> then I'll have to run the whole script all over again.

If the script worked quickly (as with awk), presumably you
wouldn't mind so much?

Regarding Linux/Unix versus Windows. It is general experience
that Linux/Unix works faster, more cleanly and efficiently, and
often more reliably, for similar tasks; and cam do so on low grade
hardware. Also, these systems come with dozens of file-processing
utilities (including perl and awk; also many others), each of which
has been written to be efficient at precisely the repertoire of
tasks it was designed for. A lot of Windows sotware carries a huge
overhead of either cosmetic dross, or a pantechnicon of functionality
of which you are only going to need 0.01% at any one time.

The Unix utilities have been ported to Windows, long since, but
I have no experience of using them in that environment. Others,
who have, can advise! But I'd seriously suggest getting hold of them.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jul-07   Time: 18:24:41
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Slightly OT - use of R

2007-07-30 Thread Ted Harding
On 30-Jul-07 08:28:15, John Logsdon wrote:
> I am trying to get a measure of how R compares in usage as a
> statistical platform compared to other software. I would guess
> it is the most widely used among statisticians at least by
> virtue of it being open source.
> 
> But is there any study to which I can refer? By asking this
> list I am not exactly adopting a rigorous approach!

I don't know about that -- my own expectation would be that
serious users of R are likely to be subscribers to the list.

So maybe a good answer to your question would be the number
of subscribers (which I'm sure Martin Maechler can find out).
Of course, some people will have subscribed under more than
one email address, so that would somewhat over-estimate the
number of people who subscribe. But it can be traded off
(to a somewhat unknown extent) against R users who do not
subscribe.

More to the point, though, is what you mean by "usage".
If you simply mean "people who use", that's a matter of
counting (one way or another). But there's "use" and "use".

There's a lot of what I call "SatNav Statistics" being done,
and I would guess that "SatNav statisticians" tend to go
for the commercial products, since these have bigger and
brighter displays, and the more mellifluous and reassuring
voice-overs. (And never mind that the voice instructs you
to turn left, at the level-crossing, onto the railway line).

Most serious R users, I tend to think, are more likely to
pull into a layby and unfold large-scale maps. And, when
the need arises, they will get out and push.

So, in "widely used among statisticians", it depends on
what you mean by "statisticians".

Where you will will probably get extra value from the R list
is that many of our people will have extensive and very
professional experience, not only with R, but with many of
the other available packages, and be best placed to provide
serious and thoughtful comparisons.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Jul-07   Time: 10:18:21
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting exponential curve to data points

2007-07-24 Thread Ted Harding
On 24-Jul-07 01:09:06, Andrew Clegg wrote:
> Hi folks,
> 
> I've looked through the list archives and online resources, but I
> haven't really found an answer to this -- it's pretty basic, but I'm
> (very much) not a statistician, and I just want to check that my
> solution is statistically sound.
> 
> Basically, I have a data file containing two columns of data, call it
> data.tsv:
> 
> year  count
> 1999  3
> 2000  5
> 2001  9
> 2002  30
> 2003  62
> 2004  154
> 2005  245
> 2006  321
> 
> These look exponential to me, so what I want to do is plot these
> points on a graph with linear axes, and add an exponential curve over
> the top. I also want to give an R-squared for the fit.
> 
> The way I did it was like so:
> 
> 
># Read in the data, make a copy of it, and take logs
> data = read.table("data.tsv", header=TRUE)
> log.data = data
> log.data$count = log(log.data$count)
> 
># Fit a model to the logs of the data
> model = lm(log.data$count ~ year, data = log.data)
> 
># Plot the original data points on a graph
> plot(data)
> 
># Draw in the exponents of the model's output
> lines(data$year, exp(fitted(model)))
> 
> 
> Is this the right way to do it? log-ing the data and then exp-ing the
> results seems like a bit of a long-winded way to achieve the desired
> effect. Is the R-squared given by summary(model) a valid measurement
> of the fit of the points to an exponential curve, and should I use
> multiple R-squared or adjusted R-squared?
> 
> The R-squared I get from this method (0.98 multiple) seems a little
> high going by the deviation of the last data point from the curve --
> you'll see what I mean if you try it.

I just did. From the plot of log(count) against year, with the plot
of the linear fit of log(count)~year superimposed, I see indications
of a non-linear relationship.

The departures of the data from the fit follow a rather systematic
pattern. Initially the data increase more slowly than the fit,
and lie below it. Then they increase faster and corss over above it.
Then the data increase less fast than the fit, and the final data
point is below the fit.

There are not enough data to properly identify the non-linearity,
but the overall appearance of the data plot suggests to me that
you should be considering one of the "growth curve" models.

Many such models start of with an increasing rate of growth,
which then slows down, and typically levels off to an asymptote.
The apparent large discrepancy of your final data point could
be compatible with this kind of behaviour.

At this point, knowledge of what kind of thing is represented
by your "count" variable might be helpful. If, for instance,
it is the count of the numbers of individuals of a species in
an area, then independent knowledge of growth mechanisms may
help to narrow down the kind of model you should be tring to fit.

As to your question about "Is this the right way to do it"
(i.e. fitting an exponential curve by doing a linear fit of the
logarithm), generally speaking the answer is "Yes". But of course
you need to be confident that "exponential" is the right curve
to be fitting in the first place. If it's the wrong type of
curve to be considering, then it's not "the right way to do it"!

Hoping this help[s,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 24-Jul-07   Time: 10:08:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using R for a reaction-time experiment

2007-07-22 Thread Ted Harding
On 22-Jul-07 16:38:02, Seth Roberts wrote:
> 
> I want to use R to run a reaction-time experiment:
> Something appears on the screen, I respond by typing something
> (one keystroke), the system measures the speed of my response.
> R would be great for this if only I didn't have to hit Enter
> to enter that keystroke. I am doing such experiments now but
> they require two actions per trial: hit keystroke, hit Enter.
> 
> Is there some way that R can be made to respond to a single
> keystroke (other than Enter)?

What operating system are you using? If it's Linux, there would
be ways to detect the keystroke (say "A") and immediately send
"A\n" (i.e. "A" followed by "enter") to a FIFO which R could be
watching. Then R would receive your single keystroke as if you
had followed it by pressing "Enter".

If you're using Windows, then unfortunately I haven't a clue.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Jul-07   Time: 18:11:43
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can I test if there are statistical significance between

2007-07-21 Thread Ted Harding
On 21-Jul-07 12:46:46, zhijie zhang wrote:
> Dear Uwe Ligges,
[restructuring the given layout of the data]


Grp1  Grp2Grp3
Better   16 0  [  1]1  |  17
   |
Good 71 4  [ 10]6  |  81
   |
Bad  3761  [118]   57  | 155
---+-
12465  [129]   64  | 253

> My hypothesis is if the three groups,that is group1, group2,and
> group3, have the same distributions on coloumns? If not, which
> one is difference from which one?

It is clear that there is no discernible difference between
Group 2 and Group 3. If they are pooled together (totals in
[...] above), it is also clear that there is a very large
difference between [Group 2 + Group 3], or either separately,
and Group 1.

In summary, and in round figures, Groups 2 and 3 have about
90% "Bad", about 10% "Good" and hardly any "Better".

Group 1 has only about 30% "Bad", about 60% "Good", and 10%
"Better".

Formally:

A<-cbind(c(16,71,37),c(1,10,118))
A
   [,1] [,2]
  [1,]   161
  [2,]   71   10
  [3,]   37  118

chisq.test(A)
  Pearson's Chi-squared test
  data:  A
  X-squared = 101.4434, df = 2, p-value = < 2.2e-16

cbind(round(chisq.test(A)$expected,1),A)
   [,1] [,2][,3] [,4]
  [1,]  8.3  8.7  161
  [2,] 39.7 41.3  71   10
  [3,] 76.0 79.0  37  118


B<-cbind(c(0,4,61),c(1,6,57))
B
   [,1] [,2]
  [1,]01
  [2,]46
  [3,]   61   57

chisq.test(B,simulate.p.value=TRUE)
  Pearson's Chi-squared test with simulated p-value
  (based on 2000 replicates)
  data:  B 
  X-squared = 1.5279, df = NA, p-value = 0.3215

fisher.test(B)
  Fisher's Exact Test for Count Data
  data:  B 
  p-value = 0.4247
  alternative hypothesis: two.sided 

That, with possible refinements for more careful statements
about the proportions in the three outcome classes, is about
all one can say about these data; and it is definite enough!

With the totally non-committal P-value for Group 2 vs Group 3,
and the absolutely decisive P-value for Group 1 vs Groups 2&3,
there is no need whatever to bother with "multiple comparison"
complications.

Best wishes,
Ted.


> On 7/20/07, Uwe Ligges <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>> zhijie zhang wrote:
>> > Dear  friends,
>> >   My R*C table is as follow:
>> >
>> >
>> >
>> > better
>> >
>> > good
>> >
>> > bad
>> >
>> > Goup1
>> >
>> > 16
>> >
>> > 71
>> >
>> > 37
>> >
>> > Group2
>> >
>> > 0
>> >
>> > 4
>> >
>> > 61
>> >
>> > Group3
>> >
>> > 1
>> >
>> > 6
>> >
>> > 57
>> >
>> >Can I test if there are statistical significant between Group1
>> >and
>> > Group2, Group2 and Group3, Group1 and Group2, taking into the
>> > multiple
>> > comparisons?
>>
>>
>> So what is you hypothesis? Statistical significance of what it to be
>> tested?
>>
>> Uwe Ligges
>>
>>
>>
>> > The table can be set up using the following program:
>> >
>> > a<-matrix(data=c(16,71,37,0,4,61,1,6,57),nrow=3,byrow=TRUE)
>> > Thanks very much.
>> >
>> >
>>
> 
> 
> 
> -- 
> With Kind Regards,
> 
> oooO:
> (..):
>:\.(:::Oooo::
>::\_)::(..)::
>:::)./:::
>::(_/
>:
> [***
> ]
> Zhi Jie,Zhang ,PHD
> Tel:86-21-54237149
> Dept. of Epidemiology,School of Public Health,Fudan University
> Address:No. 138 Yi Xue Yuan Road,Shanghai,China
> Postcode:200032
> Email:[EMAIL PROTECTED]
> Website: www.statABC.com
> [***
> ]
> oooO:
> (..):
> :\.(:::Oooo::
> ::\_)::(..)::
> :::)./:::
> ::(_/
> :


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Jul-07   Time: 14:51:46
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] automatically jpeg output

2007-07-20 Thread Ted Harding
On 20-Jul-07 15:34:00, Ding, Rebecca wrote:
> Dear R users,
> 
> I used R to draw many histograms and I would like to automatically save
> them into a jpeg file. I tried the following code since I know .ps file
> could be saved like this way:
> 
> postscript("AYA_ELA.jpeg",horizontal=F,onefile=T)
> ..#some funtions inside here
> dev.off()
> 
> There was a jpeg file, however, there is no pictures inside. Any
> suggestion? 
> 
> Thanks.
> 
> Rebecca

If your "some functions inside here" do draw histograms, then
there will indeed be pictures inside, but they will be in PostScript,
since that is what is created when you use the postscript() device.
The fact that you used ".jpeg" in the fileneam has nothing to
do with it -- it will simply store the output, as PostScript,
in the file whose name you give to it. It trusts you.

You would have seen the PostScript pictures if you had used a
PostScript viewer which reacts to the content of the files,
whereas presumably your system expects a file whose name ends
in ".jpeg" or ".jpg" to be a JPEG file; and therefore will
use the wrong interpreter and fail to recognise the picture
(as you have just proved).

If you want to create JPEG files in a similar way, use the
jpeg() function -- read ?jpeg for details.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Jul-07   Time: 17:07:20
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] presence/absence matrix

2007-07-20 Thread Ted Harding
On 20-Jul-07 14:16:39, [EMAIL PROTECTED] wrote:
> I have a table such that:
> 
> sample #   species
> 1a
> 1b
> 2a   
> 2c
> 3b 
> 
> and i would like to build a matrix with species names (i.e. a b and c) 
> as field names and samples (i.e. 1,2 and 3) as row names
> and 1/0 as inner values
> such as:
> a   b   c
> 1   1   1   0
> 2   1   0   1
> 3   0   1   0
> 
> I am currently using Rcmdr package for managing datasets but need a 
> function about I tried to use stack function but only with worst
> results

The following generates the sort of thing you show above:

## Identities of possible species and samples
  Species<-c("a","b","c","d")
  Samples<-c(1,2,3,4,5,6)

## Generate a set of samples and corresponding species:
  species<-sample(Species,20,replace=TRUE)
  samples=sample(Samples,20,replace=TRUE)
## Have a look:
 cbind(samples,species)
  samples species
 [1,] "5" "c"
 [2,] "2" "c"
 [3,] "4" "a"
 [4,] "5" "b"
 [5,] "5" "d"
 [6,] "1" "d"
 [7,] "2" "b"
 [8,] "2" "b"
 [9,] "3" "b"
[10,] "2" "d"
[11,] "4" "d"
[12,] "1" "b"
[13,] "3" "b"
[14,] "2" "c"
[15,] "2" "d"
[16,] "6" "b"
[17,] "5" "a"
[18,] "4" "a"
[19,] "2" "b"
[20,] "5" "a"

## Now generate a table:
  T<-table(samples,species)
  T
  species
  samples a b c d
1 0 1 0 1
2 0 3 2 2
3 0 2 0 0
4 2 0 0 1
5 2 1 1 1
6 0 1 0 0

Now this is a "table" structure, and also gives counts of
occurrences and not merely presence/absence. So we need to
get these out as a matrix.

  U<-as.matrix(T)
  U
  species
  samples a b c d
1 0 1 0 1
2 0 3 2 2
3 0 2 0 0
4 2 0 0 1
5 2 1 1 1
6 0 1 0 0

  U<-1*(U>0)
  U
  species
  samples a b c d
1 0 1 0 1
2 0 1 1 1
3 0 1 0 0
4 1 0 0 1
5 1 1 1 1
6 0 1 0 0

Is that what you want?
Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 20-Jul-07   Time: 16:14:14
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] print tables to figure in R

2007-07-17 Thread Ted Harding
On 16-Jul-07 22:11:31, Steve Powers wrote:
> Doe anyone know of a way to print nice-looking, printer-friendly tables
> directly in the R plot window? This would be very handy for examining 
> preliminary results from frequently used routines. I've hunted for this
> with no luck. All anyone seems to do is export text files to excel for 
> formatting, or use the print(bla) function which just calls the numbers
> at the command line. There must be a way to do this.---steve

It would certainly be possible to write a function include a table
within a plot, since a table is no more than bits of text laid out
positionally in an array, and there is already provision to place
text at assigned positions on a plot -- the function text() does this.

That being said, though, writing the code to layout a particular
table in the particular way you want would not be trivial. I don't
know of an R function which implements a reasonable "default layout"
for arbitrary tables in a plot.

As an example of "doing it with your bare hands", run the following:

x<-0.5*(0:20);y<-(x^2)/10;plot(x,y,pch="+",col="blue",ylim=c(-10,10))
lines(x,y,col="blue")
A<-c(0,2.5,5,7.5,10); B<-(A^2)/10;points(A,B,pch=2,col="red")
Ach<-as.character(round(A,digits=1))
Bch<-as.character(round(B,digits=3))
text(x=c(6.25,8.0),y=c(0,0),labels=c("A","B"),pos=4)
for(i in (1:5)){text(x=6.125,y=(-1-i),labels=Ach[i],pos=4)}
for(i in (1:5)){text(x=7.875,y=(-1-i),labels=Bch[i],pos=4)}

You could augment this with grid lines, etc. according to taste.

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jul-07   Time: 09:23:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logical operators priority

2007-07-17 Thread Ted Harding
On 17-Jul-07 06:59:11, Delphine Fontaine wrote:
> Dear R-users,
> 
> I haven't found rules for logical operators. I need to select data
> according the following rule:
> Condition A & (Condition B | Condition C)
> How should I write it_? Is Condition A & Condition B | Condition C
> correct or will R execute (Condition A & Condition B) | Condition C ?
> 
> Thanks for your help.
> 
> Delphine Fontaine

?Syntax will tell you about the precedence for operators.
In particular you will find:

   '!'negation
   '&  &&'and
   '| ||' or

indicating that "!" takes precedence over "&" and "&&", which take
precedence over "|" and "||". With equal precedence evaluation is
from left to right.

This shows that if you write A & B | C then "A & B" will be evaluated
first, so the expression is equivalent to (A & B) | C.

In any case, there is nothing whatever wrong with using "(... )"
for grouping. It is always accepted, it forces the precedence you want,
and furthermore (most important) you yourself can see exactly what
will happen and will be much less likely to make a mistake.

So write your condition in your code *explicitly* as

  ConditionA & (ConditionB | ConditionC)

You can see what's happening, and R will do exactly what you want.

Best wishes
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 17-Jul-07   Time: 09:34:18
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Errors in data frames from read.table

2007-07-16 Thread Ted Harding
On 16-Jul-07 14:13:09, Pat Carroll wrote:
> Hello, all.
> 
> I am working on a project with a large (~350Mb, about 5800 rows)
> insurance claims dataset. It was supplied in a tilde(~)-delimited
> format. I imported it into a data frame in R by setting memory.limit to
> maximum (4Gb) for my computer and using read.table. 

I had a similar problem put to me some time back, and eventually
solved it by going in with a scalpel. It turned out that there
was a problem with muddling "End-of-Line" with field delimiter
in creating the file. And the file came out of Excel in the first
place ... (did yours?). Quite why excell made this particular
mess of it remains a mystery.

I note that your file size is "350Mb" and "about 5800 rows".
Doing some arithmetic on that:

350 * 1024 * 1024 = 367,001,600 bytes

367001600/5800 = 63276.14 bytes per row.

This (given your "about"s) looks to me dangerously close to
65536 = 64K, and this may be a limit on what Excel can handle?

Just a thought ...
Ted.

> The resulting data frame had 10 bad rows. The errors appear due to
> read.table missing delimiter characters, with multiple data being
> imported into the same cell, then the remainder of the row and the next
> run together and garbled due to the reading frame shift (example: a
> single cell might contain: ~ ~  ~, after which all
> the cells of the row and the next are wrong). 
> 
> To replicate, I tried the same import procedure on a smaller
> demographics data set from the same supplier- only about 1Mb, and got
> the same kinds of errors (5 bad rows in about 3500). I also imported as
> much of the file as Excel would hold and cross-checked, Excel did not
> produce the same errors but can't handle the entire file. I have used
> read.table on a number of other formats (mainly csv and tab-delimited)
> without such problems; so far it appears there's something different
> about these files that produce
> s the errors but I can't see what it would be.
> 
> Does anyone have any thoughts about what is going wrong? And is there a
> way, short of manual correction, for fixing it?
> 
> Thanks for all help,
> ~Pat.
> 
> 
> Pat Carroll. 
> what matters most is how well you walk through the fire. 
> bukowski.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jul-07   Time: 15:59:48
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mix package causes R to crash

2007-07-13 Thread Ted Harding
On 12-Jul-07 23:47:39, Masanao Yajima wrote:
> Dear Professor Schaefer
> 
> I am experiencing a technical difficulty with your mix package.
> I would appreciate it if you could help me with this problem.
> 
> When I run the following code, R 2.5.1 and R 2.6.0 crashes.
> It's been tested on at least 2 windows machine and it is consistent.
> Execution code it's self was coped from the help file of imp.mix.
> Only thing I supplied was a fake dataset.
> 

There are several things wrong with your approach to this.

1. The mix package is for imputation when the data consist
   of variables of two kinds: one or more continuous variables
   (which will be modelled as normally distributed); and one
   or more discrete (categorical) variables. The levels of
   the latter must be represented in the data as conscutive
   integers starting at 1. In your data, all the variables
   have been generated as normally distributed, and hence
   not integers, except for x4 which has been converted to
   integer from a Normal sample using floor(). However, this
   generates negative integers, which are not acceptable to
   the mix package as levels of categorical variables.

2. The categorical variables must be the first columns in the
   data matrix. Your construction has made x4 (the only
   possible candidate) the last column; in any case your
   first 5 columns are continuous variables.

3. The data presented to mix must be a matrix, not a dataframe.

4. You invoke prelim.mix as prelim.mix(dat,3), which makes it
   treat the first 3 columns as the categorical variables,
   which they are not.

It is your "fake dataset" which is causing the trouble.
All the points above are covered in the documentation for
the functions in the mix package.

Hoping this helps,
Ted.

>
> library(mix)
> n <-100
> x1<-rnorm(n)
> x2<-rnorm(n,2,1.2)
> x3<-rnorm(n,1,2)
> x4<-floor(rnorm(n)*3)
> y <-rnorm(n,1*x1+2*x2+3*x3+4*x4,2)
> w <-rnorm(n,3,1.2)
> ymis<-y
> ymis[floor(runif(10,1,n))]<-NA
> wmis<-w
> wmis[floor(runif(10,1,n))]<-NA
> dat<-as.data.frame(cbind(wmis,ymis,x1,x2,x3,x4))
>  s <- prelim.mix(dat,3)# do preliminary manipulations
>  thetahat <- em.mix(s)   # ML estimate for unrestricted model
>  rngseed(1234567) # set random number generator seed
>  newtheta <- da.mix(s,thetahat,steps=100) # data augmentation
>  ximp <- imp.mix(s, newtheta, dat)  # impute under newtheta
>
> 
> Your mix package is important part of our ongoing research on the
> missing data project and we would like to have it working.
> If you could point out to me what I am doing wrong or
> some technical difficulty that I am not aware of it will be highly
> appreciated.
> 
> Thank you for your help in advance.
> 
> Sincerely
> 
> Masanao Yajima
> [EMAIL PROTECTED]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jul-07   Time: 09:50:52
-- XFMail --


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jul-07   Time: 10:11:39
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calculating percent error from 2 vectors

2007-07-12 Thread Ted Harding
On 12-Jul-07 17:32:03, Scott Bearer wrote:
> Hello,
> 
> I believe this is an easy scripting problem, but one I am stumbling on.
> 
> I have a "known" vector of 3 colors with nrow=10:
> known<-c("red", "blue", "red", "red", "yellow", "blue", "yellow",
> "blue",
> "blue", "yellow")
> 
> and a model output vector:
> modelout<-c("red", "red", "red", "blue", "yellow", "blue", "blue",
> "red",
> "blue", "yellow")
> 
> I would like to determine the proportion (in)correctly identified for
> each
> color.  In other words:
> % correct "red"=
> % correct "blue"=
> % correct "yellow"=
> 
> How would I code this (assuming the actual dataset is more complex)?

For your example:

> tbl<-table(known,modelout)

> tbl
modelout
knownblue red yellow
  blue   22   0 
  red12   0 
  yellow 10   2 

> dim(tbl)
[1] 3 3

> for(i in (1:dim(tbl)[1])){print(sum(tbl[i,-i])/sum(tbl[i,]))}
[1] 0.5
[1] 0.333
[1] 0.333

and you can modify the "print" command produce a desired format,
e.g. using rownames(tbl)[i] for the successive colour names.

Hoping this helps (as a start),
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Jul-07   Time: 20:15:34
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eMail results out of R

2007-07-12 Thread Ted Harding
On 12-Jul-07 16:10:46, Stéphane Dray wrote:
> Here is a small function that I used on Debian. It requires exim4 :
> 
> send.mail<-function(addr='[EMAIL PROTECTED]',subject='A 
> message from R',
> text=paste("I have finished to work 
> ",Sys.time(),coll="")){
> # send an email
> # it requires the reconfiguration of exim4
> # you have to connect as root and
> # then type dpkg-reconfigure exim4config

I'm a bit puzzled by this. On any Unix/Linux system (unless
something has changed very recently which I haven't heard about),
the 'mail' command simply works, for any user (without having
to become root); does not require exim4 (or any particular version
of any particular mail agent--so long as something has to be set up
so that email can be sent at all), and (for the purpose of using
'mail' from R) does not require exim4 or any other mail agent to
be re-configured. The email will be sent "From:" the user who
is running R.

In the example I posted just now, I just used 'mail' in R's
system() command without doing anything special. The mail transfer
agent in my case is 'sendmail', but it's a standard configuration
and nothing special has been done.



> mail.cmd<-paste("mail ",
> "-s \"",subject,"\" ",
> addr,
> " << EOT &\n",
> text,"\n",
> "EOT",
> sep="",collapse="")
>  system(mail.cmd,intern=FALSE)
>   }
> 
> Cheers,
> 
> Romain Francois wrote:
>> Hi,
>>
>> There is a paper in the April 2007 issue of R News that might be of
>> help 
>> here.
>> http://##cran mirror##/doc/Rnews/Rnews_2007-1.pdf
>>
>> Romain
>>
>> Duncan Murdoch wrote:
>>   
>>> On 7/12/2007 9:52 AM, [EMAIL PROTECTED] wrote:
>>>   
>>> 
>>>> Hi everyone,
>>>>
>>>> I did my homework and read the posting guideline :-)
>>>>
>>>> I want to eMail the results of a computing automatically. So I get
>>>> the results (the parameters of a garch process) and I want to eMail
>>>> them to another person. How can I do that?
>>>> 
>>>>   
>>> This will depend on the system you're using.  If the command
>>> "emailit" 
>>> would work from the command line on your system, then
>>>
>>> system("emailit")
>>>
>>> should work from within R.  Writing that command is the hard part, of
>>> course.
>>>
>>> Duncan Murdoch
>>>   
>>> 
> 
> 
> -- 
> Stéphane DRAY ([EMAIL PROTECTED] )
> Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
> 43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
> Tel: 33 4 72 43 27 57   Fax: 33 4 72 43 13 88
> http://biomserv.univ-lyon1.fr/~dray/
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Jul-07   Time: 18:03:20
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eMail results out of R

2007-07-12 Thread Ted Harding
On 12-Jul-07 13:52:56, [EMAIL PROTECTED] wrote:
> Hi everyone,
> 
> I did my homework and read the posting guideline :-)
> 
> I want to eMail the results of a computing automatically. So I get the
> results (the parameters of a garch process) and I want to eMail them to
> another person. How can I do that?
> 
> Thx

As well as the answers from John Scillieri and Duncan Murdoch:
if it does happen that you're using a Unix/Linux system, then
the appropriate variant of the following illustration will work
(the 'mail' command should always be present on variants of these
systems):

In the little project I'm working on in R at the moment,
I have variables x1 and x2 (each length-50 numeric vectors).

So I can, in R, do:

  sink(file="myoutput")
  cbind(x1,x2)
  sink()
  system("mail ted -s \"Test No 2\" < myoutput")

which has stored the 2-column output from cbind(x1,x2) in the
file "myoutput", and then used the 'mail' command to mail its
contents to 'ted' with subject 'Test No 2' (I used a multi-word
subject to illustrate the quotes the command line will need,
to avoid splitting the subject into separate tokens).

See ?sink for how to use the sink() command.

Then 'ted' duly received an email with contents:

===
Date: Thu, 12 Jul 2007 17:10:44 +0100
From: Ted Harding <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Test No 2

   x1   x2
 [1,]  0.37282844  0.002743146
 [2,]  0.93293155 -0.108009247
 ...
[49,] -0.08681427  0.828313288
[50,] -0.23621908  0.385269729
===

You can of course encapsulate something like the above sequence
of commands into an R function, depending on how you organise the
storage of the results you want to email.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 12-Jul-07   Time: 17:25:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] system: Linux vs. Windows differences

2007-07-11 Thread Ted Harding
On 11-Jul-07 18:28:19, Marc Schwartz wrote:
> On Wed, 2007-07-11 at 16:16 -0200, Alberto Monteiro wrote:
>> [I tried to send this messages two days ago, but I guess I mistyped
>> the To: 
>> address...]
>> 
>> Why "system" is different in Linux and Windows? Both in R 2.4.1, but
>> in 
>> Windows there is an option:
>> 
>>   system(something, wait = FALSE)
>> 
>> while on Linux (Fedora Core 4), there is no such option?
>> 
>> Alberto Monteiro
> 
> From:
> 
>   https://svn.r-project.org/R/trunk/NEWS
> 
> for changes in R 2.5.0:
> 
> o system() now takes the same set of arguments on all platforms,
>   with those which are not applicable being ignored with a
>   warning.  Unix-alikes gain 'input' and 'wait', and Windows
>   gains 'ignore.stderr'.
> 
> 
> Time to upgrade both your R and your FC installation.  
> 
> R is at 2.5.1.
> 
> FC4 has been EOL (End of Life) for some time and FC5 hit EOL last
> month.
> 
> HTH,
> 
> Marc Schwartz

"End of Life" is a Nomenklatura categorisation.

While we loyal Members welcome and applaud President-2.5.1,
we do not forget old Comrades now air-brushed from the photographs
who, now faceless, sturdily labour 24 hours a day in the fields
and saltmines, and even in the dark attics of those who conceal
and protect them still.

There can be life in old dogs, and even strong teeth in some.

Ted.
[emailing from SuSE-5.2 (1997), logged in from Red Hat 9 (2003)]


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Jul-07   Time: 22:32:13
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] EM algorithm for Missing Data.

2007-07-09 Thread Ted Harding
On 09-Jul-07 02:20:47, Marcus Vinicius wrote:
>  Dear all,
> I need to use the EM algorithm where data are missing.
> Example:
> x<- c(60.87, NA, 61.53, 72.20, 68.96, NA, 68.35, 68.11, NA, 71.38)
> 
> May anyone help me?
> 
> Thanks.
> 
> Marcus Vinicius

The Dempster, Laird & Rubin reference given by Simon Blomberg
is the classical account of the EM Algorithm for incomplete
information, though there has been a lot more published since.

However, more to the point in the present case: If the above
is typical of your data, you had better state what you want to
do with the data.

Do you want to fit a distribution by estimating parameters?
Are they observations of a "response" variable with covariates
and you want to fit a linear model estimating the coefficients?
Are they data from a time-series and you need to interpolate
at the missing values?
Etc.??

Depending on what you want to do, the way you apply the general
EM Algorithm procedure may be very different; and a lot of
applications are not covered by Dempster, Laird & Rubin (1977).

And there may possibly be no point anyway: If all you want to do
is estimate the mean of the distribution of the data, then the
best procedure may simply be to ignore the missing data.

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 09-Jul-07   Time: 09:18:56
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change the "coeffcients approach" on an anova

2007-07-07 Thread Ted Harding

On 08-Jul-07 00:36:46, vincent guyader wrote:
> hi everybody
> 
> I have to do a lot of Anova with R and I would like to have another
> type of
> coefficients coding.. I explain.
> 
> by default if I have 2 temperatures for an experience. 100°C or 130°C
> and I
> want to see the temperature effect on the presure
> I want to estimate the coefficient of each temperature.
> 
> I will obtain ,with the anova, juste one coefficients for example +3,56
> (for
> 100°C), and the other (for 130°C) is always  zero.
> 
> but I prefer to obtain + 1,78 or the first effect and -1,78 for the
> second
> (the intercept is also different too)
> 
> my script is (it s just an example)
> 
> rinfo2 <- (lm(pression~ temp, data=rebe))
> anova(rinfo2)
> summary(rinfo2)
> 
> what can I change to obtain what I need?

Try

  rinfo2 <- (lm(pression~temp-1, data=rebe))

Ted.

----
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jul-07   Time: 02:18:24
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the index "the number of species combin

2007-07-07 Thread Ted Harding
On 07-Jul-07 22:18:42, Zhang Jian wrote:
> I want to analyze the co-occurrence of some species. In some
> papers, the authors said that the index"the number of species
> combinations (COMBO)" is a good index. I try to calculate the
> index by R language. But I can not get the right value. I think
> that I do not understand the concept of the index because my
> english is not good.
> 
> The concept:
> *The number of species combinations   *This index scans the
> columns of the presence-absence matrix and keeps track of the
> number of unique species combinations that are represented in
> different sites. For an assemblage of n species, there are 2n

[I think this should be 2^n]

> possible species combinations, including the combination of no
> species being present (Pielou and Pielou 1968). In most real
> matrices, the number of sites (= columns) is usually substantially
> less than 2n, which places an upper bound on the number of species
> combinations that can be found in both the observed and the
> simulated matrices.

English good or bad, I found the above description not easy to
follow! But, since I could see only one thing it could mean if
it was intended to gice a unique definition, I decided to test
on you data the hypothesis that Species Combinations means the
number of distinct columns in the data matrix.

I took your species-incidence data (not reproduced here) and
converted it to a matrix S of 0 and 1 with 19 columns and 17
rows (though the following would work just as well if it is a
dataframe):

S
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19
1   0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
2   0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
3   0  0  0  0  0  0  0  0  0   0   1   0   0   0   0   0   0   0   0
4   0  0  0  0  0  0  0  0  0   0   1   0   0   0   0   0   0   0   0
5   1  0  0  0  0  0  0  0  0   0   0   0   0   1   0   0   0   0   0
6   0  0  0  1  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
7   0  1  1  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
8   0  0  0  0  1  1  1  1  1   1   1   1   0   0   1   1   1   1   0
9   0  0  0  0  0  0  0  1  0   0   0   0   0   0   0   0   0   0   0
10  1  1  1  1  0  0  0  0  0   0   0   0   0   1   0   0   0   0   0
11  0  1  1  1  1  1  1  1  1   1   1   1   1   0   1   1   1   1   1
12  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
13  0  0  1  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
14  0  0  0  0  1  1  1  1  0   0   0   0   0   0   0   0   0   0   0
15  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
16  0  1  1  1  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
17  0  1  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0

There are 10 different columns in there, as can be found by

t(unique(t(S)))
   V1  V2  V3  V4  V5  V8  V9 V11 V13 V14
1   0   1   0   0   0   0   0   0   0   0
2   0   1   0   0   0   0   0   0   0   0
3   0   0   0   0   0   0   0   1   0   0
4   0   0   0   0   0   0   0   1   0   0
5   1   0   0   0   0   0   0   0   0   1
6   0   0   0   1   0   0   0   0   0   0
7   0   1   1   0   0   0   0   0   0   0
8   0   0   0   0   1   1   1   1   0   0
9   0   0   0   0   0   1   0   0   0   0
10  1   1   1   1   0   0   0   0   0   1
11  0   1   1   1   1   1   1   1   1   0
12  1   0   0   0   0   0   0   0   0   0
13  0   0   1   0   0   0   0   0   0   0
14  0   0   0   0   1   1   0   0   0   0
15  1   0   0   0   0   0   0   0   0   0
16  0   1   1   1   0   0   0   0   0   0
17  0   1   0   0   0   0   0   0   0   0

Of the "missing" columns, it can be seen that V6 and V7 are the
same as V5, and V10, V12, V15, V16, V17, V18, V19 are the same
as V9. Hence the interpretation that "COMBO" means the number of
distinct columns is confirmed. If that is really the case, then
a very simple R function can be written for it:

COMBO<-function(S){ncol(t(unique(t(S}

COMBO(S)
[1] 10

> About the data, I calculated the index "COMBO" by other software.
> The value of the index is 10.
> Can you help me to calculate or give me some detalied explain about
> the concept of the index?

See above! I hope it helps.
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jul-07   Time: 00:46:08
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recursion in R ...

2007-07-07 Thread Ted Harding
On 07-Jul-07 10:34:03, Uwe Ligges wrote:
> Alberto Monteiro wrote:
>> Ted Harding wrote:
>>> So I slickly wrote a recursive definition:
>>>
>>> Nnk<-function(n,k){
>>>   if(n==1) {return(k)} else {
>>> R<-0;
>>> for(r in (1:k)) R<-(R+Nnk(n-1,k-r+1)) # ,depth))
>>>   }
>>>   return(R)
>>> }
>>>
>> You are aware that this is equivalent to:
>> 
>> Nnk1 <- function(n, k) { prod(1:(n+k-1)) / prod(1:n) / prod(1:(k-1)) }
> 
> or
> 
> Nnk2 <- function(n, k) { gamma(n+k) / gamma(n+1) / gamma(k) }
> 
> or most easily:
> 
> Nnk3 <- function(n, k) choose(n+k-1, n)
> 
> Uwe Ligges

OK, I can recognise a negative binomial coefficient when I see one!

I'm wondering, though, what is the "natural" connection between
choose(n+k-1, n) and the statement of the original question:

  What is the number of distinct non-decreasing sequences of length n
  which can be made using integers chosen from (1:k)?
  (repetition allowed, of course)

(The fact that this leads to a recurrence relationship which is
satisfied by choose(n+k-1,n) is not what I mean by "natural"
-- I'm looking for a correspondence between such a sequence, and
a choice of n out of (n+k-1) somethings).

Ted.

>> aren't you?
>>  
>>> ON THAT BASIS: I hereby claim the all-time record for inefficient
>>> programming in R.
>>>
>>> Challengers are invited to strut their stuff ...
>>>
>> No, I don't think I can bet you unintentionally, even though
>> my second computer program that I ever wrote in life had to be
>> aborted, because it consumed all the resources of the computer.
>> 
>> Alberto Monteiro
>> 
>> __
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jul-07   Time: 12:15:21
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recursion in R ...

2007-07-06 Thread Ted Harding
Hi Folks,

R has known speed issues for recursive definitions.
There was a thread "Extremely slow recursion in R?" in
August 2006 (24 Aug, from Jason Liao), with some
interesting comparisons between programming languages.

I'm re-opening the topic, with an ulterior motive
(stated at the end).

The starting point was the question "How many distinct
increasing sequences of length n can be made from k
integers (1:k)?", i.e. what is the size of the sample
space of

  sort(sample((1:k),n,replace=TRUE))

Let this number be Nnk(n,k). I aptly observed that

  Nnk(1,k) = k
  Nnk(n,k) = Nnk(n-1,k) + sum[r= 1 to k](Nnk(n-1,r))

So I slickly wrote a recursive definition:

Nnk<-function(n,k){
  if(n==1) {return(k)} else {
R<-0;
for(r in (1:k)) R<-(R+Nnk(n-1,k-r+1,depth))
  }
  return(R)
}

I then ran a few test cases, to check timing etc.:

Nnk(21, 7)   34 seconds
Nnk(21, 8)  120 seconds
Nnk(21, 9)  384 seconds
Nnk(21,10) 1141 seconds

I then complacently set it to work out the number I happened
to really want:

Nnk(21,20)

24 hours later, no result; and an extra "test line" (not
shown above) at the start of the function had prodced no
output, so the function had not even returned to the top
level for the first time.

Then, with heavy thunderstorms rolling in, and my mains
power beginning to flicker, I closed down the computer.

A quick extrapolation based on the above results, and a
few others, suggested that the time for completion of
Nnk(21,20) would be about 2 years.

Then I thought about it.

The recursion relation in fact shows that Nnk(n,k) is
the maximum value in cumsum(Nnk(n,r)) over r = 1:k.

Hence (for n>=1 and r>=1):

Nnk<-function(n,k){
  K<-rep(1,k)
  for(i in (1:n)){
K<-cumsum(K)
  }
  return(max(K))
}

This definition then gave me, in so brief a blink of the
eye that I could make no estimate of it, the result:

Nnk(21,20) = 131282408400

In fact, I had to go up to things like

Nnk(1000,200) = 3.335367e+232

before I could sensibly perceive the delay (about 0.5 seconds
in this case).

On the old recursive definition, the computation might have
outlasted the Universe.



ON THAT BASIS: I hereby claim the all-time record for inefficient
programming in R.

Challengers are invited to strut their stuff ...

Best wishes to all, and happy end of week,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jul-07   Time: 17:53:55
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A More efficient method?

2007-07-04 Thread Ted Harding
On 04-Jul-07 13:44:44, Keith Alan Chamberlain wrote:
> Dear Rhelpers,
> 
> Is there a faster way than below to set a vector based on values
> from another vector? I'd like to call a pre-existing function for
> this, but one which can also handle an arbitrarily large number
> of categories. Any ideas?
> 
> Cat=c('a','a','a','b','b','b','a','a','b')# Categorical variable
> C1=vector(length=length(Cat)) # New vector for numeric values
> 
># Cycle through each column and set C1 to corresponding value of Cat.
> for(i in 1:length(C1)){
>   if(Cat[i]=='a') C1[i]=-1 else C1[i]=1
> }
> 
> C1
> [1] -1 -1 -1  1  1  1 -1 -1  1
> Cat
> [1] "a" "a" "a" "b" "b" "b" "a" "a" "b"

> Cat=c('a','a','a','b','b','b','a','a','b')

> Cat=="b" 
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE

> (Cat=="b") - 0.5
[1] -0.5 -0.5 -0.5  0.5  0.5  0.5 -0.5 -0.5  0.5

> 2*((Cat=="b") - 0.5)
[1] -1 -1 -1  1  1  1 -1 -1  1

to give one example of a way to do it. But you don't say why you
really want to do this. You may really want factors. And what do
you want to see if there is "an arbitrarily large number of
categories"?

For instance:

> factor(Cat,labels=c(-1,1))
[1] -1 -1 -1 1  1  1  -1 -1 1 

but this is not a vector, but a "factor" object. To get the vector,
you need to convert Cat to an integer:

> as.integer(factor(Cat))
[1] 1 1 1 2 2 2 1 1 2

where (unless you've specified otherwise in factor()) the values
will correspond to the elements of Cat in "natural" order, in this
case first "a" (-> 1), then "b" (-> 2).

E.g.

> Cat2<-c("a","a","c","b","a","b")
> as.integer(factor(Cat2))
[1] 1 1 3 2 1 2

so, with C2<-as.integer(factor(Cat2)), you get a vector of distinct
integers 91,2,3) for the distinct levels ("a","b","c") of Cat2.
If you want integer values for these levels, you can write a function
to change them.

Hoping this helps to beark the ice!
Ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 04-Jul-07   Time: 16:44:20
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Substitution of Variables

2007-07-02 Thread Ted Harding
On 02-Jul-07 23:01:07, Judith Flores wrote:
> Hi,
>I need to run a script under the variable (that
> comes from a csv file) that is assigned to another
> variable. This is very a simplified version of what I
> want to do:
> 
> data<-read.csv('name.csv')
> names(data)<-("VA","VB","VC")
> 
> v<-VA
> 
> mn(v)
> 
> Thank you in advance,
> Judith

read.csv() makes 'data' into a dataframe. This is essentially
a list with named components. So VA is not yet available as
a variable in your program. The data for the variable you want
to call "VA" is stored as the component 'data$VA' of 'data'.
The names "VA", "VB", "VC" are the names you have given to
the *components* of 'data'; you have not created separate
variables with these names.

So you can assign the data for VA directly to 'v' with:

  v<-data$VA

or (if you think you will need it later) create a variable VA
using

  VA<-data$VA

and, if you want a variable called 'v' as well, then:

  v<-VA

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-Jul-07   Time: 00:32:40
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dominant eigenvector displayed as third (Marco Visser)

2007-06-29 Thread Ted Harding
On 29-Jun-07 21:09:37, Peter Dalgaard wrote:
> Marco Visser wrote:
>> This is just a curiousity, I was wondering why the dominant
>> eigenvetor and eigenvalue of the following matrix is given
>> as the third. I guess this could complicate automatic selection 
>> procedures. 
>>
>> 000005
>> 100000
>> 010000
>> 001000
>> 000100
>> 000010
>>[...]
>> Comment: In Matlab the the dominant eigenvetor and eigenvalue 
>> of the described matrix are given as the sixth. Again no idea why.
>>   
> 
> 
> I get
> 
>  > eigen(mat)$values
> [1] -0.65383+1.132467i -0.65383-1.132467i  0.65383+1.132467i  
> 0.65383-1.132467i
> [5] -1.30766+0.00i  1.30766+0.00i
>  > Mod(eigen(mat)$values)
> [1] 1.307660 1.307660 1.307660 1.307660 1.307660 1.307660
> 
> So all the eigenvalues are equal in modulus. What makes you think
> one of them is "dominant"?

When I run it I get eigenvectors 3 and 6 both purely real.
It may be that Marco has confused this with "dominant".

Also, the eigenvalues of these two are real, and have the largest
real parts (+/- 1.3076605).

All others have complex eigenvalues, of which the real parts are
+/- 0.6538302.

It may be that Marco has been misled by this, perceiving the real
part rather than both real and complex parts, and being led to think
that the largest real part corresponds to the largest eigenvalue.

As has been clearly pointed out, this is not the way to look at it!

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Jun-07   Time: 22:47:01
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spectral Decomposition [OOPS]

2007-06-29 Thread Ted Harding
On 29-Jun-07 13:23:05, Ted Harding wrote:
> [Sorry -- a silly typo in my previous]:
> If A is not symmetric, you have "left" eigenvectors:
> 
>   x'*A = lambda*x'
> 
> and "right" eigenvectors:
> 
>   A*x = lambda*x
> 
> and the "left" eigenvectors are not the same as the "right"
> eigenvectors, though you have the same set of eigenvalues lambda
> in each case.
> 
> You then have
> 
>   A = L'*B*R

Should be:

  A = R*B*L'

in that L'*R = I (unit), so then

  A*R = R*B

so each column (right eigenvector) of R is multiplied by the
corresponding lambda;

  L'*A = B*L'

so each row (left eigenvector) of L' is multiplied by the
corresponding lambda.

Apologies for the slip!
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Jun-07   Time: 14:42:19
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spectral Decomposition

2007-06-29 Thread Ted Harding

On 29-Jun-07 12:29:31, Doran, Harold wrote:
> All of my resources for numerical analysis show that the spectral
> decomposition is
> 
> A = CBC'
> 
> Where C are the eigenvectors and B is a diagonal matrix of eigen
> values. Now, using the eigen function in R
> 
># Original matrix
> aa <- matrix(c(1,-1,-1,1), ncol=2)
> 
> ss <- eigen(aa)
> 
># This results yields back the original matrix according to the
> formula above
> ss$vectors %*% diag(ss$values) %*% t(ss$vectors)
> 
> However, for the following I do not get back the original matrix
> using the general formula for the spectral decomposition:
> 
> set.seed(123)
> 
> aa <- matrix(rnorm(16), ncol=4)
> 
> ss <- eigen(aa)
> 
> ss$vectors %*% diag(ss$values) %*% t(ss$vectors)
> 
> However, if I do the following
> 
> A = CBC^{-1}
> 
> I get back the original matrix A
> 
> ss$vectors %*% diag(ss$values) %*% solve(ss$vectors)

Harold, I think the key to the issue is whether your original
matric is symmetric or not. For your formula

  A = C*B*C'

to work, where B is a diagonal matrix (therefore essentially
symmetric) you have -- bearing in mind the reversal of factors --

  A' = ReverseFactorsIn(C'*B'*C) = C*B*C' = A

so A would have to be symmetric. This was the case for your
first example matrix(c(1,-1,-1,1), ncol=2).

However, your second example will not be symmetric, so the
formula will not work, and you will need A = C*B*C^(-1).

If A is not symmetric, you have "left" eigenvectors:

  x'*A = lambda*x'

and "right" eigenvectors:

  A*x = lambda*x

and the "left" eigenvectors are not the same as the "right"
eigenvectors, though you have the same set of eigenvalues lambda
in each case.

You then have

  A = L'*B*R

Of course the most frequent occurrence of this kind of question
in statistics is where A is a covariance or correlation matrix,
which is symmetric by definition.

Hoping this helps!
Ted.

> In my lit search I am not finding an explanation for why this works, so
> I am seeking R-help since there may be a computational rationale that
> can be explained by users (or authors) of the function. In my
> experimentation with some computations it seems that the latter
> approach is more general in that it yields back the matrix I began
> with, but deviates from the formula I commonly see in texts.
> 
> Thanks,
> Harold


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 29-Jun-07   Time: 14:23:03
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unix-like permissions to allow a user to update recommen

2007-06-18 Thread Ted Harding
On 18-Jun-07 20:27:56, Patrick Connolly wrote:
> On Mon, 18-Jun-2007 at 11:53AM +0100, Ted Harding wrote:
> 
>|> On 18-Jun-07 10:11:43, Patrick Connolly wrote:
>|> > I installed R from the tar.gz file (as root) in a directory under
>|> > /usr/local.  The recommended packages are installed in a library in
>|> > that directory whereas additional packages I install in a directory
>|> > under the /home directory as a user.
>|> > 
>|> > Updating the additional packages is very easy with
>|> > update.packages()
>|> > as a non-root user, but the recommended packages cannot be done so
>|> > readily because of file permissions.
>|> > 
>|> > My question is how do I set the permissions or ownerships in the
>|> > /usr/local/R-2.5.0 directory so that everything necessary can be
>|> > writable by a user?  Should I make a group for R users (total of
>|> > one
>|> > member) or is it simpler than that?
>|> 
>|> Since you have root access, do you need to segregate the additional
>|> packages to a particular user?
> 
> It's handy to not have to reload packages that don't change between
> versions of the basic installation.
> 
>|> 
>|> Though I don't run R as root for general use, I always install/update
>|> by running R CMD as root. This makes all of R ("recommended" and also
>|> any extras) available system-wide, and no pemission problems arise.
>|> 
>|> This of course does not stop you from setting up a special .Rprofile
>|> for each user, since this by definition lives in the user's home
>|> directory.
>|> 
>|> Does this help? Or are there issues you haven't mentioned which make
>|> such an approach not feasible?
> 
> I don't exactly have issues.  It's not a huge problem I'm dealing
> with.  It's simple enough for me to use update.packages() as a user
> which will download the appropriate packages.  Though they won't be
> installed, they are all in the one place in the /tmp/ directory from
> where I can install them as root.  I just thought there must be a more
> elegant way to set permissions so that users could write to the
> subdirectories under /usr/local/R-2.xxx/.  So much of the installation
> process of R and its packages is so elegant, I'd like to retain some
> of that elegance.

On my Linux, all the places where components of R might normally
be installed (/usr/lib or /usr/local/lib) are user=root, group=root,
and when I look under them practically everything is writeable
only by user=root. So you'd have to change a lot of permissions
before any other user got rights to write to these directories.
Even adding a user to group=root wouldn't change things, since
group does not have write permissions (unless user=root too).

I'm still wondering, though, why you don't just run the command
update.packages() as root. You have root access, and you said
(in the "adding user to group" context) that only one user is
involved (presumably yourself?). In that case, why not start R
as root and run update.packages()?

Or am I missing something?

Best wishes,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jun-07   Time: 22:25:18
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] source a specific function

2007-06-18 Thread Ted Harding
On 18-Jun-07 18:53:50, Duncan Murdoch wrote:
> 
> The exact syntax you list there won't work, but in any case, changing 
> the environment of a function in a package is a bad idea -- it may need
> to reference things from the namespace of the package.

Well, as I said before, "(assuming that you know about
interdependencies)"!

I tried Gabor's suggested syntax as follows, bearing in mind that
mvrnorm in MASS is pure R code calling only "base" functions:

library(MASS)
environment(mvrnorm) <- .GlobalEnv
mu=c(0,0)
V<-matrix(c(1.0,0.5,0.5,1.0),ncol=2)
detach()
ls()
[1] "%.+%""%+.%""mu"  "mvrnorm" "V"  

mvrnorm(10,mu,V)
 [,1]   [,2]
 [1,] -1.80466069 -1.8229928
 [2,]  0.05565147 -1.6279434
 [3,] -0.28505572 -0.8927696
 [4,] -0.48919795  0.0750501
 [5,] -0.08437832  0.1349296
 [6,]  2.17399713  1.2881640
 [7,]  1.59934824  1.3784665
 [8,]  0.30555420  0.3835743
 [9,]  0.11120527 -0.7287910
[10,] -0.77281783 -1.2265502

so that one seems to have worked.

However, I'm not sure about the space used by MASS "going away"
after detach():

Starting R from scratch:

Type 'q()' to quit R.

> gc()
 used (Mb) gc trigger (Mb)
Ncells 412589 11.1 597831   16
Vcells 102417  0.8 7864326

> library(MASS)
> gc()
 used (Mb) gc trigger (Mb)
Ncells 459471 12.3 667722 17.9
Vcells 87  0.9 786432  6.0

> environment(mvrnorm) <- .GlobalEnv
> detach()
> gc()
 used (Mb) gc trigger (Mb)
Ncells 459297 12.3 741108 19.8
Vcells 05  0.9 786432  6.0
> gc()
 used (Mb) gc trigger (Mb)
Ncells 459304 12.3 818163 21.9
Vcells 18  0.9 786432  6.0

so only about 170KB of the 2.2MB used by MASS has been recovered
after detach(). Or am I looking at the wrong indicator of space used?

On the other hand, if I first extract the code 9f mvrnorm()
which is a few lines of pure R, I get:

1. Start R from scratch:
gc()
 used (Mb) gc trigger (Mb)
Ncells 412589 11.1 597831   16
Vcells 102417  0.8 7864326

2. Paste in the code for mvrnorm, and then:
gc()
 used (Mb) gc trigger (Mb)
Ncells 412844 11.1 667722 17.9
Vcells 102591  0.8 786432  6.0

so mrvnorm on its own is only taking up about

 (412844-412589) + (102591-102417) = 429KB

Hence I wonder whether the space first occupied by MASS (2.2MB)
apart from mvrnorm gets freed up at all?

Thanks for the comments.
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jun-07   Time: 22:00:33
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Optimization

2007-06-18 Thread Ted Harding
On 18-Jun-07 16:01:03, livia wrote:
> 
> Hi, I would like to minimize the value of x1-x2, x2 is a fixed
> value of 0.01,
> x1 is the quantile of normal distribution (0.0032,x) with
> probability of 0.7, and the changing value should be x.
> Initial value for x is 0.0207.

I'm a bit puzzled by the question. If I understand it right,
we can ignore x2 (since it is a fixed value) and simply consider
minimising x1 (instead of x1-x2).

Then, denoting by P(u) the cumulative normal distribution function
for mean=0 and variance=1 (i.e. in R: pnorm(u,0,1)), and by Q(p)
its inverse, corresponding to qnorm(p,0,1), we have (again if I
have understood right):

  P((x1 - 0.0032)/x) = 0.7

so

  x1 = 0.0032 + x*Q(0.7)

and therefore, since Q(0.7) > 0 and x must be positive, the value
of x1 can be made as close to 0.032 as you please (but greater
than 0.032) by taking x small enough.

Hence there is no strictly minimising value of x, but the greatest
lower bound of all possible values of x1 is 0.032.

Then you can subtract x2.

The fact that there is no positive value of x which gives this
bound as the value probably explains the failure of your optim()
attempt.

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jun-07   Time: 17:46:01
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] source a specific function

2007-06-18 Thread Ted Harding
On 18-Jun-07 14:28:35, Gabor Grothendieck wrote:
> This loads all the functions into an anonymous environment defined
> by local and then exports f to the global environment.
> 
> f <- local({
>   source("/a.R", local = TRUE)
>   environment(f) <- .GlobalEnv
>   f
> })

That looks neat! Two questions:

1. Would something similar work for extracting selected functions
   from a library (assuming that you know about interdependencies)?

   E.g. something like

  f <- local({
   library(f.etc.lib)
   environment(f) <- .GlobalEnv
   f
  })


2. Having done what you describe to extract just f from a source
   file, can one then "delete" the local environment used to load
   the source? I think what I'm basically asking is whether the
   exporting is done "by value" (local environment deletion OK)
   or "by reference" (deletion would destroy the exported object).

Apologies, but for instance "?local" is a bit too deep for me!

The underlying agenda behind these queries is the saving of
memory space.

With theanks,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jun-07   Time: 17:11:15
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unix-like permissions to allow a user to update recommen

2007-06-18 Thread Ted Harding
On 18-Jun-07 10:11:43, Patrick Connolly wrote:
> I installed R from the tar.gz file (as root) in a directory under
> /usr/local.  The recommended packages are installed in a library in
> that directory whereas additional packages I install in a directory
> under the /home directory as a user.
> 
> Updating the additional packages is very easy with update.packages()
> as a non-root user, but the recommended packages cannot be done so
> readily because of file permissions.
> 
> My question is how do I set the permissions or ownerships in the
> /usr/local/R-2.5.0 directory so that everything necessary can be
> writable by a user?  Should I make a group for R users (total of one
> member) or is it simpler than that?

Since you have root access, do you need to segregate the additional
packages to a particular user?

Though I don't run R as root for general use, I always install/update
by running R CMD as root. This makes all of R ("recommended" and also
any extras) available system-wide, and no pemission problems arise.

This of course does not stop you from setting up a special .Rprofile
for each user, since this by definition lives in the user's home
directory.

Does this help? Or are there issues you haven't mentioned which make
such an approach not feasible?

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jun-07   Time: 11:53:19
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R vs. Splus in Pharma/Devices Industry

2007-06-16 Thread Ted Harding
On 16-Jun-07 01:12:18, Marc Schwartz wrote:
> On Sat, 2007-06-16 at 00:43 +0100, [EMAIL PROTECTED] wrote:
>> On 15-Jun-07 19:43:55, John Kane wrote:
>> > --- Douglas Bates <[EMAIL PROTECTED]> wrote:
>> >> On 6/15/07, Nicholas Lewin-Koh <[EMAIL PROTECTED]>
>> >> wrote:
>> >> > [...]
>> >> I think you mean A, not Z.  First there was S, then
>> >> there was R.
>> > 
>> > We're regressing 
>> 
>> But not to mediocrity! [1]
>> Ted.
>> 
>> [1] Practical Regression and Anova using R
>> Julian J. Faraway (July 2002), page 15.
>> 
>> http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
> 
> 
> Which in turn is of course paraphrasing Galton's "Regression Toward
> Mediocrity in Hereditary Stature”. Journal of the Anthropological
> Institute 15 : 246-63, 1886.
> 
> http://galton.org/essays/1880-1889/galton-1886-jaigi-regression-stature.
> pdf
> 
>:-)
> 
> Regards,
> 
> Marc

Indeed! My reason for referencing it that way was to indicate
obliquely that we've moved on a lot since Galton, especially
in R :-) :-)  [ including one left over from last time :-) ]

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jun-07   Time: 09:28:11
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 'gv' and fractional points

2007-06-15 Thread Ted Harding
On 15-Jun-07 20:08:05, Gabor Grothendieck wrote:
> Check out the engauge digitizer:
> 
> http://digitizer.sourceforge.net/

Thanks, Gabor. This looks useful, as far as it goes.

However, it doesn't deal directly with PS, so one
would have to work to pixel resolution of screenshots
as displayed by say 'gv'.

In the case of an example such as the Fig 4 I mentioned
in a previous post, this would involve taking several
screenshots to "tile" over the entirety of the graph,
since the pixel accuracy or a screenshot of the entire
graph would be rather poor.

This would introduce the neccessity of keeing track of
points from screenshot to screenshot, and also each
screenshot would have its own coordinate system so these
would have to be reconciled before the final dataset
could be compiled.

But thanks -- this could well be worth a try in a different
kind of context (e.g. I have some screenshots of old maps
which I want to digitise ... ).

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jun-07   Time: 01:13:53
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R vs. Splus in Pharma/Devices Industry

2007-06-15 Thread Ted Harding
On 15-Jun-07 19:43:55, John Kane wrote:
> --- Douglas Bates <[EMAIL PROTECTED]> wrote:
>> On 6/15/07, Nicholas Lewin-Koh <[EMAIL PROTECTED]>
>> wrote:
>> > [...]
>> I think you mean A, not Z.  First there was S, then
>> there was R.
> 
> We're regressing 

But not to mediocrity! [1]
Ted.

[1] Practical Regression and Anova using R
Julian J. Faraway (July 2002), page 15.

http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jun-07   Time: 00:43:39
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R vs. Splus in Pharma/Devices Industry

2007-06-15 Thread Ted Harding
On 15-Jun-07 19:30:45, Douglas Bates wrote:
> On 6/15/07, Nicholas Lewin-Koh <[EMAIL PROTECTED]> wrote:
>> Hi,
>> 
>> Probably when the statistical community is using Z big pharma will be
>> ready to use
>> R. %P
> 
> I think you mean A, not Z.  First there was S, then there was R.

And, furthermore, A is further from R than Z is, which seems fitting.
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jun-07   Time: 23:57:13
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 'gv' and fractional points

2007-06-15 Thread Ted Harding
On 15-Jun-07 15:56:58, hadley wickham wrote:
> This doesn't answer your original question, and isn't much help unless
> you're on a mac, but there's a nice looking program that makes this
> kind of graph scraping really easy:
> http://www.arizona-software.ch/applications/graphclick/en/
> 
> Hadley

Thanks, Hadley! But (as you implicitly surmise) I don't use a Mac
(just non-psychedelic Linux).

However, as a follow-up, I've since found that one can (somewhat
tediously) do what I was asking with the GIMP.

If you start the GIMP on a PostScript file, you initially get a
"Load PostScript" window which asks you to choose (amongst other
things" the "resolution", initially "100". If you wind this up
to say "1000", then you get 10 times the positional precision
in the numbers shown as below.

When the image is displayed, there is again a little window
which gives you the GIMP coordinates of the mouse position.
With increased "Resolution", the numerical values vary
correspondingly more rapidly with position.

The tedious aspect (compared with 'gv') is that to get better
visual resolution you need to zoom the image. This enlarges the
whole image, with the result that you have to pan around to
locate different parts, and you can get lost in a graph with
lots of widely-spread points.

With 'gv', on the other hand, you can select any small part
to view in zoom in a sub-window, which is much easier to cope
with; and you also get a "rubber-band" rectangle which enables
you to readily align points here with points there -- e.g. if
you want to read off a curve the y-value corresponding to say
x=5.0.

However, this is at least a partial answer to my own question!

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jun-07   Time: 17:29:49
---------- XFMail --


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jun-07   Time: 20:32:59
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [OT] 'gv' and fractional points

2007-06-15 Thread Ted Harding
On 15-Jun-07 16:29:53, Ted Harding wrote:
> [...]
> However, as a follow-up, I've since found that one can (somewhat
> tediously) do what I was asking with the GIMP.

As well as the awkwardness of doing it the GIMP way, I've
discovered another disadvantage.

I'd previously tried it on a rather small image (175x70 points,
= 2.43x0.97 inches).

I then tried it on a full A4 page. Even at a GIMP "Scale"
factor of 300, this leads to a 50MB temporary file being
created. At 1000, this would rise to some 550MB, as I found
out after this attempt filled up the limited spare space
I have on the disk drive in question ...

No doubt "Scale"=300 (as opposed to the default of 100) may be
ample for most purposes, but the overhead is still unpleasant!

Hence I'm once again hankering after something which will display
a PS file as efficiently as 'gv', but will report the cursor
position in fractions of a point!

Best wishes to all,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jun-07   Time: 19:18:48
-- XFMail --

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jun-07   Time: 20:33:19
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [OT] 'gv' and fractional points

2007-06-15 Thread Ted Harding
Hi Folks,

This is off-topic R-wise, but it may be close to
the heart of many R-users, so I think it may be
the best place to ask!

Users of 'gv' (the "front end" to ghostscript) will
be aware of the little window which gives you the
x-y coordinates (in points = 1/72 inch) of the position
of the "cross-hair" mouse cursor. These coordinates
are those of the corresponding position on the printed
page, relative to some origin.

I have often used this to extract numerical values
for data from graphs in Postscript files (also PDF
files, after you have converted them to PS). Then
(veering back on topic ... ) you can submit the
numerical data to R and try your own analyses on
these data, and compare with what the article does.

However, this little window only gives the numbers
in whole points. Say a smallish graphic may print
out 3 inches wide or high. Then you get precision
of 1/216 per 3 inches or 0.4% of full scale. This
can be adequate on many occasions, but can be on
the coarse side on other occasions.

Even for a 6-inch-wide/high graph, you only get down
to 0.2% of full scale.

If it were possible to induce 'gv' to display these
coordinates in tenths of a point, then much greater
precision (as adequate as one can expect to hope for
when, in effect, "measuring off the graph") could be
obtained.

Does anyone know:
a) Whether it is possible to persuade 'gv' to give
   this display in fractional points (my own search
   of the documentation has not revealed anything);
b) Of any alternative to 'gv' as PS viewer which would
   provide this capability?

With thanks, and best wishes to all,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Jun-07   Time: 16:13:21
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-14 Thread Ted Harding
As a tangent to this thread, there is a very relevant
article in the latest issue of the RSS magazine "Significance",
which I have just received:

  Dr Fisher's Casebook
  The trouble with data

Significance, Vol 4 (2007) Issue 2.

Full current contents at

http://www.blackwell-synergy.com/toc/sign/4/2

but unfortunately you can only read any of it by paying
money to Blackwell (unless you're an RSS member).

Best wishes to all,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jun-07   Time: 12:24:46
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Responding to a posting in the digest

2007-06-14 Thread Ted Harding
On 14-Jun-07 07:26:26, Moshe Olshansky wrote:
> Is there a convenient way to respond to a particular
> posting which is a part of the digest?  
> I mean something that will automatically quote the
> original message, subject, etc.
> 
> Thank you!
> 
> Moshe Olshansky
> [EMAIL PROTECTED]

This will depend on two things.

1. Whether the mail software you use has the capability;
2. Whether the digest format would permit it anyway.

Regarding (2), if you are receiving R-help in "traditional"
digest format (all the messages, each with its principal
headers, as one single long message-body), then the only
way to respond to a particular message is to start to
compose a new message and copy what you need from the digest.

While I've never reveived R-help in digest format myself,
according to Martin Maechler:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/59429.html

  Please open the URL at the end of every message
 https://stat.ethz.ch/mailman/listinfo/r-help
  go to the bottom and "log in" -- clicking the
  [Unsubscribe or Edit Options] field. You need your
  mailing list password sooner or later. The one you
  get sent every 1st of the month; or you can have it
  sent to you again.

  Then you are in a page entitled
 "R-help Membership Configuration for @
Fax-to-email: +44 (0)870 094 0861
Date: 14-Jun-07   Time: 09:53:58
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Awk and Vilno

2007-06-13 Thread Ted Harding
On 13-Jun-07 01:24:41, Robert Wilkins wrote:
> In clinical trial data preparation and many other data situations,
> the statistical programmer needs to merge and re-merge multiple
> input files countless times. A syntax for merging files that is
> clear and concise is very important for the statistical programmer's
> productivity.
> 
> Here is how Vilno does it:
> 
> inlist dataset1 dataset2 dataset3 ;
> joinby variable1 variable2  where ( var3<=var4 ) ;
> [...]

Thanks to Robert for this more explicit illustration of what Vilno
does. Its potential usefulness is clear.

I broadly agree with the comments that have been made about the
various approaches (with/without awk/sed/R etc).

Is there any URL that leads to a fuller explicit exposition of Vilno?

As I said previously, a web-search on "vilno" leads to very little
that is relevant. What I did find didn't amount to much.

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jun-07   Time: 10:00:12
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating artificial datasets with a specific correlati

2007-06-12 Thread Ted Harding
On 12-Jun-07 20:54:05, Ken Knoblauch wrote:
> see mvrnorm in MASS and especially the empirical argument
> 
> James Milks  wright.edu> writes:
> 
> 
>> I need to create artificial datasets with specific correlation  
>> coefficients (i.e. a dataset that returns r = 0.30, etc.) as examples 
>> for a lab I am teaching this summer.  Is there a way to do that in R?
>> 
>> Thanks.
>> 
>> Jim Milks

Alternatively, if you would prefer your datasets to have non-nomal
distributions, consider the fact that if X and Y are independent,
each with mean 0 and variance 1, then the correlation coefficient
between (X + a*Y)  and (X - a*Y) is

  (1 - a^2)/(1 + a^2)

so if you choose a = sqrt((1 - r)/(1 + r)) then these will have
correlation coefficient r.

So generate X and Y as you please, and then continue as above.

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jun-07   Time: 00:35:59
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Appropriate regression model for categorical variables

2007-06-12 Thread Ted Harding
On 12-Jun-07 17:45:44, Tirthadeep wrote:
> 
> Dear users,
> In my psychometric test i have applied logistic regression
> on my data. My data consists of 50 predictors (22 continuous
> and 28 categorical) plus a binary response. 
> 
> Using glm(), stepAIC() i didn't get satisfactory result as
> misclassification rate is too high. I think categorical
> variables are responsible for this debacle. Some of them have
> more than 6 level (one has 10 level).
> 
> Please suggest some better regression model for this situation.
> If possible you can suggest some article.

I hope you have a very large number of cases in your data!

The minimal complexity of the 28 categorical variables compatible
with your description is

  1 factor at 10 levels
  2 factors at 7 levels
 25 factors at 2 levels

which corresponds to (2^25)*(7^2)*10 = 16441671680 ~= 1.6e10
distinct possible combinations of levels of the factors. Your
true factors may have far more than this.

Unless you have more cases than this in your data, you are
likely to fall into what is called "linear separation", in which
the logistic regression will find a perfect predictor for your
binary outcome. This prefect predictor may well not be unique
(indeed if you have only a few hundred cases there will probably
be millions of them).

Therefore your logistic reggression is likely to be meaningless.

I can only suggest that you consider very closely how to

a) reduce the numbers of levels in some of your factors,
   by coalescing levels together;
b) defining new factors in terms of the old so as to reduce
   the total number of factors (which may include ignoring
   some factors altogether)

so that you end up with new categorical variables whose total
number of possible combinations is much smaller (say at most 1/5)
of the number of cases in your data.

In summary: you have to many explanatory variables.

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jun-07   Time: 00:23:49
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Ted Harding
On 10-Jun-07 19:27:50, Stephen Tucker wrote:
> 
> Since R is supposed to be a complete programming language,
> I wonder why these tools couldn't be implemented in R
> (unless speed is the issue). Of course, it's a naive desire
> to have a single language that does everything, but it seems
> that R currently has most of the functions necessary to do
> the type of data cleaning described.

In principle that is certainly true. A couple of comments,
though.

1. R's rich data structures are likely to be superfluous.
   Mostly, at the sanitisation stage, one is working with
   "flat" files (row & column). This straightforward format
   is often easier to handle using simple programs for the
   kind of basic filtering needed, rather then getting into
   the heavier programming constructs of R.

2. As follow-on and contrast at the same time, very often
   what should be a nice flat file with no rough edges is not.
   If there are variable numbers of fields per line, R will
   not handle it straightforwardly (you can force it in,
   but it's more elaborate). There are related issues as well.

a) If someone entering data into an Excel table lets their
   cursor wander outside the row/col range of the table,
   this can cause invisible entities to be planted in the
   extraneous cells. When saved as a CSV, this file then
   has variable numbers of fields per line, and possibly
   also extra lines with arbitrary blank fields.

   cat datafile.csv | awk 'BEGIN{FS=","}{n=NF;print n}'

   will give you the numbers of fields in each line.

   If you further pipe it into | sort -nu you will get
   the distinct field-numbers. If you know (by now) how many
   fields there should be (e.g. 10), then

   cat datafile.csv | awk 'BEGIN{FS=","} (NF != 10){print NR ", " NF}'

   will tell you which lines have the wrong number of fields,
   and how many fields they have. You can similarly count how
   many lines there are (e.g. pipe into wc -l).

b) Poeple sometimes randomly use a blank space or a "." in a
   cell to demote a missing value. Consistent use of either
   is OK: ",," in a CSV will be treated as "NA" by R. The use
   of "." can be more problematic. If for instance you try to
   read the following CSV into R as a dataframe:

   1,2,.,4
   2,.,4,5
   3,4,.,6

   the "." in cols 2 and 3 is treated as the character ".",
   with the result that something complicated happens to
   the typing of the items.

   typeeof(D[i,j]) is always integer. sum(D[1,1]=1, but
   sum(D[1,2]) gives a type-error, even though the entry
   is in fact 2. And so on , in various combinations.

   And (as.nmatrix(D)) is of course a matrix of characters.

   In fact, columns 2 and 3 of D are treated as factors!

   for(i in (1:3)){ for(j in (1:4)){ print( (D[i,j]))}}
   [1] 1
   [1] 2
   Levels: . 2 4
   [1] .
   Levels: . 4
   [1] 4
   [1] 2
   [1] .
   Levels: . 2 4
   [1] 4
   Levels: . 4
   [1] 5
   [1] 3
   [1] 4
   Levels: . 2 4
   [1] .
   Levels: . 4
   [1] 6

   This is getting altogether too complicated for the job
   one wants to do!

   And it gets worse when people mix ",," and ",.,"!

   On the other hand, a simple brush with awk (or sed in
   this case) can sort it once and for all, without waking
   the sleeping dogs in R.

I could go on. R undoubtedly has the power, but it can very
quickly get over-complicated for simple jobs.

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jun-07   Time: 22:14:35
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Ted Harding
On 10-Jun-07 14:04:44, Sarah Goslee wrote:
> On 6/10/07, Ted Harding <[EMAIL PROTECTED]> wrote:
> 
>> ... a lot of the problems with data
>> files arise at the data gathering and entry stages, where people
>> can behave as if stuffing unpaired socks and unattributed underwear
>> randomly into a drawer, and then banging it shut.
> 
> Not specifically R-related, but this would make a great fortune.
> 
> Sarah
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org

I'm not going to object to that!
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jun-07   Time: 21:18:45
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-10 Thread Ted Harding
On 10-Jun-07 02:16:46, Gabor Grothendieck wrote:
> That can be elegantly handled in R through R's object
> oriented programming by defining a class for the fancy input.
> See this post:
>   https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html
> for a simple example of that style.
> 
> On 6/9/07, Robert Wilkins <[EMAIL PROTECTED]> wrote:
>> Here are some examples of the type of data crunching you might
>> have to do.
>>
>> In response to the requests by Christophe Pallier and Martin Stevens.
>>
>> Before I started developing Vilno, some six years ago, I had
>> been working in the pharmaceuticals for eight years ( it's not
>> easy to show you actual data though, because it's all confidential
>> of course).

I hadn't heard of Vilno before (except as a variant of "Vilnius").
And it seems remarkably hard to find info about it from a Google
search. The best I've come up with, searching on

  vilno  data

is at
  http://www.xanga.com/datahelper

This is a blog site, apparently with postings by Robert Wilkins.

At the end of the Sunday, September 17, 2006 posting "Tedious
coding at the Pharmas" is a link:

  "I have created a new data crunching programming language."
   http://www.my.opera.com/datahelper

which appears to be totally empty. In another blog article:

  "go to the www.my.opera.com/datahelper site, go to the August 31
   blog article, and there you will find a tarball-file to download,
   called vilnoAUG2006package.tgz"

so again inaccessible; and a google on "vilnoAUG2006package.tgz"
gives a single hit which is simply the same aricle.

In the Xanga blog there are a few examples of tasks which are
no big deal in any programming language (and, relative to their
simplicity, appear a bit cumbersome in "Vilno"). 

I've not seen in the blog any instance of data transformation
which could not be quite easily done in any straigthforward
language (even awk).

>> Lab data can be especially messy, especially if one clinical
>> trial allows the physicians to use different labs. So let's
>> consider lab data.
>> [...]

That's a fairly daunting description, though indeed not at all
extreme for the sort of data that can arise in practice (and
not just in pharmaceutical investigations). But the complexity
is in the situation, and, whatever language you use, the writing
of the program will involve the writer getting to grips with
the complexity, and the complexity will be present in the code
simply because of the need to accomodate all the special cases,
exceptions and faults that have to be anticipated in "feral" data.

Once these have been anticipated and incorporated in the code,
the actual transformations are again no big deal.

Frankly, I haven't yet seen anything "Vilno" that couldn't be
accomodated in an 'awk' program. Not that I'm advocating awk for
universal use (I'm not that monolithic about it). But I'm using
it as my favourite example of a flexible, capable, transparent
and efficient data filtering language, as far as it goes.


SO: where can one find out more about Vilno, to see what it may
really be capable of that can not be done so easily in other ways?


(As is implicit in many comments in Robert's blog, and indeed also
from many postings to this list over time and undoubtedly well
known to many of us in practice, a lot of the problems with data
files arise at the data gathering and entry stages, where people
can behave as if stuffing unpaired socks and unattributed underwear
randomly into a drawer, and then banging it shut).

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-Jun-07   Time: 09:28:10
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tools For Preparing Data For Analysis

2007-06-08 Thread Ted Harding
On 08-Jun-07 08:27:21, Christophe Pallier wrote:
> Hi,
> 
> Can you provide examples of data formats that are problematic
> to read and clean with R ?
> 
> The only problematic cases I have encountered were cases with
> multiline and/or  varying length records (optional information).
> Then, it is sometimes a good idea to preprocess the data to
> present in a tabular format (one record per line).
> 
> For this purpose, I use awk (e.g.
> http://www.vectorsite.net/tsawk.html),
> which is very adept at processing ascii data files  (awk is
> much simpler to learn than perl, spss, sas, ...).

I want to join in with an enthusiastic "Me too!!". For anything
which has to do with basic checking for the kind of messes that
people can get data into when they "put it on the computer",
I think awk is ideal. It is very flexible (far more so than
many, even long-time, awk users suspect), very transparent
in its programming language (as opposed to say perl), fast,
and with light impact on system resources (rare delight in
these days, when upgrading your software may require upgrading
your hardware).

Although it may seem on the surface that awk is "two-dimensional"
in its view of data (line by line, and per field in a line),
it has some flexible internal data structures and recursive
function capability, which allows a lot more to be done with
the data that have been read in.

For example, I've used awk to trace ancestry through a genealogy,
given a data file where each line includes the identifier of an
individual and the identifiers of its male and female parents
(where known). And that was for pedigree dogs, where what happens
in real life makes Oedipus look trivial.

> I have never encountered a data file in ascii format that I
> could not reformat with Awk.  With binary formats, it is
> another story...

But then it is a good idea to process the binary file using an
instance of the creating software, to produce a ASCII file (say
in CSV format).

> But, again, this is my limited experience; I would like to
> know if there are situations where using SAS/SPSS is really
> a better approach.

The main thing often useful for data cleaning that awk does
not have is any associated graphics. It is -- by design -- a
line-by-line text-file processor. While, for instance, you
could use awk to accumulate numerical histogram counts, you
would have to use something else to display the histogram.
And for scatter-plots there's probably not much point in
bringing awk into the picture at all (unless a preliminary
filtration of mess is needed anyway).

That being said, though, there can still be a use to extract
data fields from a file for submission to other software.

Another kind of area where awk would not have much to offer
is where, as a part of your preliminary data inspection,
you want to inspect the results of some standard statistical
analyses.

As a final comment, utilities like awk can be used far more
fruitfully on operating systems (the unixoid family) which
incorporate at ground level the infrastructure for "plumbing"
together streams of data output from different programs.

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Jun-07   Time: 10:43:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Spectral analysis

2007-06-06 Thread Ted Harding
On 06-Jun-07 20:55:09, David LEDU wrote:
> Hi all,
> 
> I am dealing with paleoceanographic data and I have a C14 time serie
> and one other variable. I would like to perform a spectral analysis
> (fft or wavelet) and plot it. Unfortunately I don't know the exact
> script to do this. Does anybody could send me an example to perform my
> spectral analysis ?
> 
> I Thank you
> 
> David

There are a lot of possible ramifications to your query,
but for a basic spectral analysis of a series you can use
the function spectrum() in the "stats" package.

What is the role of the "other variable"?

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jun-07   Time: 22:34:07
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random numbers selection - simple example

2007-06-06 Thread Ted Harding
On 06-Jun-07 14:30:44, Jenny Barnes wrote:
> Dear R-help,
> 
> Which random number generator function would you recommend for
> simply picking 15 random numbers from the sequence 0-42? I want
> to use replacement (so that the same number could potentially be
> picked more than once).

R has the function sample() which samples a given number of items
from a given set, without replacement by default, but with replacement
if you specify this. Enter

  ?sample

for more information. In the above case

  sample((0:42), 15, replace=TRUE)

will do what you seem to describe above. Example:

> sample((0:42), 15, replace=TRUE)
 [1] 26 38  1 41 11 30 22 37 28  0  0 25 10 39 27

if you want them in random order (i.e. "as they come off the line"),
or

> sort(sample((0:42), 15, replace=TRUE))
 [1]  1  3  5  8  8 10 16 17 21 25 30 30 33 34 40

if you want them sorted.

Best wishes,
Ted.

> I have read the R-help archives and the statistics and computing
> book on modern Applied statistics with S but the advice seems to
> be for much form complicated examples, there must be a simpler way
> for what I am trying to do?
> 
> If anybody can help me I would greatly appreciate your advice and time,
> 
> Best Wishes,
> 
> Jenny

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jun-07   Time: 16:24:15
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] spatial simulation

2007-06-02 Thread Ted Harding
On 02-Jun-07 01:06:48, Kitty Lee wrote:
> Dear R-users,
> 
> I'm trying to do some spatial simulation. I have two covariates,
> Z and C. I want to examine their relationship under different
> spatial distribution. 
> 
> I have no problem simulating completely spatial random process
> but I'm totally stuck on poisson (cluster) pattern. I already
> have a dataset with Z and C (obs=575) and I know the relationship
> between them. Using these 575 cases, how can I simulate a clustered
> process and have a dataset that has four columns:
> 
> x-coor y-coor z c
> 
> I know I can use rpois or pcp.sim to generate points that clustered
> and then use cbind to attach Z and C values. But the problem is my
> observations will not be spatially clustered. How can I simulate so
> that now Z is spatially clustered?
> 
> Thanks!
> 
> K.

Have a look at the package spatstat:

Description:   Spatial Point Pattern data analysis, modelling and
   simulation including multitype/marked points and
   spatial covariates

It has a flexible repertoire of spatial point processes that you
can simulate from.

Best wishes,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 02-Jun-07   Time: 08:33:17
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Make sign test show test statistics

2007-05-14 Thread Ted Harding
On 14-May-07 10:07:53, Johan A. Stenberg wrote:
> When I perform a two-tailed sign test with the following simple syntax,
> 
> binom.test(59,100)
> 
> R returns a P-value (0.088) but nothing else. As I want the result for
> a 
> one-tailed test I take P/2 = 0.044).

1: If you want a 1-sided P-value, use the appropriate option
to binom.test(). For example (assuming your Null Hypothesis
is that p = 0.5, which is the default, and your Alternative
Hypothesis is that p > 0.5):

  binom.test(x=59, n=100,alternative="greater")

Exact binomial test

  data:  59 and 100 
  number of successes = 59, number of trials = 100, p-value = 0.04431
  alternative hypothesis: true probability of success is greater
than 0.5 
  95 percent confidence interval:
0.5028918 1.000 
  sample estimates:
  probability of success 
0.59 

(which confirms that your "divide by two" strategy was about right
in this case, though that will not always be true; and certainly
will not be appropriate for the opposite Alternative Hypothesis
that p < 0.5).

> However, the journal to which I've
> submitted my results requests the test statistics, not just the 
> P-values. How can I make R return the test statistics?

2: Have a word (et borgerlig ord, as the Danes say -- sorry
I do not have the same familiarity with Swedish, but maybe you
say the same) with the journal editor.

The plain and simple fact is that the data (x = 59 out of n = 100)
constitue the test statistic. In the conditional context that
n=100, the test statistic is simply x = 59.

However, one should strictly speaking always include a statement
about n, no matter how one expresses the test statistic. For instance,
you might use t = x/n as test statistic (with expectation p=0.5
and SD sqrt( p*(1-p)/n ) in your case). While its expecation now
depends only on the NH (p = 0.5), it cannot be evaluated as a test
without knowldge of n since that is needed to get the SD.

So R (see above) has indeed returned the "test statistic", and
the journal is not being reasonable!

Hoping this helps,
Ted.

------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 14-May-07   Time: 11:57:19
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] power 2x3 exact test

2007-05-10 Thread Ted Harding
On 10-May-07 15:07:10, Thomas Lumley wrote:
> On Thu, 10 May 2007, [EMAIL PROTECTED] wrote:
>>
>> Given that you expect some cells to be small, it should not
>> be a severe task to draw up a list of (a1,b1) values which
>> correspond to rejection of the null hypothesis (that both
>> ORs equal 1), and then the simulation using different values
>> of the two odds-ratios will give you the power for each such
>> pair of odds-ratios.
>>
>> The main technical difficulty will be simulation of random
>> tables, conditional on the marginals, with the probabilities
>> as given above.
>>
>> I don't know of a good suggestion for this.
> 
> r2dtable().

Thanks for pointing this out! (And if I had astutely done
help.search("marginal") I would have found it).

> If this is a power calculation, though, you probably want to
> fix only one margin, which is a much simpler problem,

That is probably a fair point (real-life situations where both
margins are objectively fixed are probably sparse, to the point
where it may almost be worth collecting them as an "exhibition").

But then Bingshan Li is faced with the issue that the power
(given the odds-ratio) is then dependent on one of the cell
probabilities as a "nuisance parameter". This of course can
be eliminated by conditioning on the total number of successes
over the dimension whose marginals are fixed; but then we are
in effect back to the Fisher Exact Test, whose power is a function
of the odds-ratio alone.

I'm still with Duncan Murdoch, in that what's meant by "power"
in this case depends very much on what you mean by "alternative
hypothesis".

Best wishes to all,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-May-07   Time: 18:01:13
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Allocating shelf space

2007-05-09 Thread Ted Harding
Hi Folks,

This is not an R question as such, though it may well have
an R answer. (And, in any case, this community probably
knows more about most things than most others ... indeed,
has probably pondered this very question).

I: Given a "catalogue" of hundreds of books, where each
"entry" has author and title (or equivalent ID), and also

Ia) The dimensions (thickness, height, depth) of the book
Ib) A sort of classification of its subject/type/genre

II: Given also a specification of available and possibly
potential bookshelf space (numbers of book-cases, the width,
height and shelf-spacing of each, and the dimensions of any
free wall-space where further book-cases may be placed),
where some book-cases have fixed shelves and some have shelves
with (discretely) adjustable position, and additional book-cases
can be designed to measure (probably with adjustable shelves).

Question: Is there a resource to approach the solution of the
problem of optimising the placement of adjustable shelves,
the design of additional bookcases, and the placement of the
books in the resulting shelf-space so as to

A: Make the efficient use of space
B: Minimise the spatial disclocation of related books
   (it is acceptable to separate large books from small books
   on the same subject, for the sake of efficient packing).

Awaiting comments and suggestions with interest!
With thanks,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 09-May-07   Time: 18:23:53
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] power 2x3 exact test

2007-05-09 Thread Ted Harding
On 09-May-07 22:00:27, Duncan Murdoch wrote:
> On 09/05/2007 5:11 PM, Bingshan Li wrote:
>  > Hi, all,
>  >
>  > I am wondering if there is an algorithm for calculating power of 2x3
>  > table using Fisher exact test. There is one (algorithm 280) for 2x2
>  > Fisher exact test but I couldn't find one for 2x3 table. If we are
>  > not lucky enough to have one, is there any other way to calculate
>  > exact power of 2x3 table? The reason why I want exact power is
>  > because some cells are assumed to be very small and chi square
>  > approximation is not valid.
> 
> I think there are lots of possible alternatives to the null in a 2x3 
> table, so you may have trouble finding a single answer to this
> question. 
>   But assuming you have one in mind, I'd suggest doing a Monte Carlo 
> power calculation:  simulate a few thousand tables from the alternative
> distribution, and see what the distribution of p-values looks like.
> 
> Duncan Murdoch

I'd back Duncan on that point!

More specifically, for the 2x2 table, the table, conditional on the
marginals, is a function of one element (say top left-hand corner),
and the probability of any table depends on the single parameter
which is the odds-ratio of the 4 cell probabilities.

so this case is relatively easy and straightforward and, indeed,
for the 2x2 table R'a fisher.test() allows you to specify the
odds-ratio as a "null" parameter.

This is not the case with fisher.test() for a larger (say 2x3
table), so to investigate that case you cannot use fisher.test().

In all cases, however (according to the FORTRAN code on which
itis based -- see the reference in "?fisher.table"), the rejection
region for the exact fisher.test() consists of those table with
the smallest probabilities.

For the 2x3 table, say (cell counts with margins, and probabilities):


   a1  b1  c1 | d1  p1  q1  r1
  |
   a2  b2  c2 | d2  p2  q2  r2
   ---
   a   b   c  |  n

so that

   a1+b1+c1 = d1, a2+b2+c2 = d2,
   a1+a2 = a, b1+b2 = b, c1+c2 = c

the table is a function of any two functionally independent cells
(say a1 and b1), and its probability is a function of some two
odds-ratios, say

   (p1*r2)/(r1*p2)

   (q1*r2)/(r1*q2)

which, for the standard null hypothesis, are both equal to 1.
However, as Duncan says, alternatives are 2-dimensional and
so there is not a unique natural form for an alternative (as
opposed  to the 2x2 case, where it boils down to (p1*q2)/(p2*q1)
being not equal to 1, therefore greater than 1, or less than 1,
or 2-sidedly either >1 or <1).

The probability of the 2x3 table is proportional to

  ((p1*r2)/(r1*p2))^a1 * ((q1*r2)/(r1*q2))^b1

(or equivalent), divided by the product of the factorials of
a1, b1, c1, a2, b2, c2, subject to summing to 1 over all
combinations of (a1,b1) giving rise to a table compatible
with the marginal constraints.

Given that you expect some cells to be small, it should not
be a severe task to draw up a list of (a1,b1) values which
correspond to rejection of the null hypothesis (that both
ORs equal 1), and then the simulation using different values
of the two odds-ratios will give you the power for each such
pair of odds-ratios.

The main technical difficulty will be simulation of random
tables, conditional on the marginals, with the probabilities
as given above.

I don't know of a good suggestion for this. The fisher.test()
function will not help (see above). In the case of the 2x2
table, is is a straightforward hypergeometric distribution,
but 2x3 is not straightforward. Maybe someone has written
a function for this kind of application, and can point it
out to us???

A quick R-site search did not help!

Best wishes,
ted.



E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 10-May-07   Time: 00:12:29
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A function for raising a matrix to a power?

2007-05-06 Thread Ted Harding
[Apologies: This will probably break the thread, but at the moment
I cannot receive mail since my remote mail-server is down, and so
am responding to the message as found on the R-help archives;
hence this is not a "Reply"]

From:
Atte Tenkanen attenka at utu.fi
Sun May 6 11:07:07 CEST 2007

> Is there a function for raising a matrix to a power? For example if you
> like to compute A%*%A%*%A, is there an abbreviation similar to A^3?
> 
> Atte Tenkanen
> 
> > A=rbind(c(1,1),c(-1,-2))
> > A
>  [,1] [,2]
> [1,]11
> [2,]   -1   -2
> > A^3
>  [,1] [,2]
> [1,]11
> [2,]   -1   -8
> 
> But:
> 
> > A%*%A%*%A
>  [,1] [,2]
> [1,]12
> [2,]   -2   -5


Of course, "^" alone acts element-wise on the matrix A, so the result
of A^3 is the matrix B where B[i,j] = A[i,j]^3.

However, you can write your own, and call it for instance "%^%":

"%^%"<-function(A,n){
  if(n==1) A else {B<-A; for(i in (2:n)){A<-A%*%B}}; A
  }

Then:

> A
 [,1] [,2]
[1,]11
[2,]   -1   -2

> A%^%1
 [,1] [,2]
[1,]11
[2,]   -1   -2

> A%^%2
 [,1] [,2]
[1,]0   -1
[2,]13

> A%^%3
 [,1] [,2]
[1,]12
[2,]   -2   -5

> A%^%100
  [,1]  [,2]
[1,] -1.353019e+20 -3.542248e+20
[2,]  3.542248e+20  9.273727e+20


Hoping this helps!
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-May-07   Time: 11:35:31
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating contingency table from mixed data

2007-05-06 Thread Ted Harding
 
var1: 1
Var3: 3
Var4: 1
$Count
[1] 1

$Mean
Var2 Var5 
  12  152 


but this format is not very convenient for incorporating into
a contingency table such as the one shown above (obtained by hand). 

Probably others can find a way to convert the above output from
CT into a contingency table.

However, unless you have a lot of these to do, it may be quicker
to do one, or a few, by hand!

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 06-May-07   Time: 02:57:12
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to reproduce the same sampled units?

2007-05-02 Thread Ted Harding
On 02-May-07 23:25:18, Santanu Pramanik wrote:
> Hi all,
>  
> Is it possible to generate the same sample number of times in R?
> In SAS, using the option "seed" it is possible to reproduce exactly
> the same sample. Is there any such feature in R which I can use? 
>  
> For further clarity,
>  
> for (i in 1:2)
> {
> samp = sample(1:1000,100,replace = FALSE)
> print(samp)
> }

> > For the above simulation, is it possible to generate the same sampled
> units twice?
>  
> Any suggestion would be gratefully acknowledged.
> Thanks,
> Santanu

Have a look at set.seed():

  ?set.seed

You will of course have to invoke it before you do your first run.
You cannot find out what the seed was afterwards!

For example (your example somewhat reduced):

  for (i in 1:2)
  {
  samp = sample(1:1000,10,replace = FALSE)
  print(samp)
  }
# [1] 984 587  26 920 304 247 434 392 650 279
# [1] 178 619 458 208 389 629 988 263 598 308

  for (i in 1:2)
  {
  set.seed(512534)
  samp = sample(1:1000,10,replace = FALSE)
  print(samp)
  }
# [1] 271 106 570 621 257 663 399 454 983 805
# [1] 271 106 570 621 257 663 399 454 983 805

Best wishes,
Ted.

--------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 03-May-07   Time: 00:56:43
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation using parts of density function

2007-05-02 Thread Ted Harding
On 02-May-07 07:45:48, Prof Brian Ripley wrote:
> Please do not send everything twice: you are using R-help in both the
> To: 
> and Cc: fields.
> 
> I disagree with Ted: it _is_ much easier to create a generator for this
> purpose.
> 
> Consider
> 
> rtgamma <- function(n, ..., tr = log(500))
> {
>  p <- pgamma(tr, ...)
>  qgamma(p*runif(n), ...)
> }
> 
> as inversion (especially at C level) is plenty fast enough.

Of course ... !!

Just to explain Brian's solution above:

Since pgamma(rgamma(...),...) is uniformly distributed on (0,1),
if rgamma is truncated to (0,tr) them pgamma(rgamma) will be
truncated to (0,pgamma(tr)), and hence uniformly distributed
on this range.

Best wishes,
Ted.



------------
E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 02-May-07   Time: 09:16:08
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation using parts of density function

2007-05-01 Thread Ted Harding
On 01-May-07 17:03:46, Thür Brigitte wrote:
> 
> Hi
> 
> My simulation with the followin R code works perfectly:
> sim <- replicate(999, sum(exp(rgamma(rpois(1,2000),
> scale = 0.5, shape = 12
> 
> But now I do not want to have values in object "sim" exceeding
> 5'000'000, that means that I am just using the beginning of
> densitiy function gamma x < 15.4. Is there a possibility to
> modify my code in an easy way?
> 
> Thanks for any help!
> 
> Regards, Brigitte

A somewhat extreme problem!

The easiest way to modify the code is as below -- certiainly easier
than writing a special function to draw random samples from the
truncated gamma distribution.

A bit of experimentation shows that, from your code above, about
10% of the results are <= 500. So:

  sim<-NULL
  remain <- 999
  while(remain>0){
sim0<-replicate(10*remain,
   sum(exp(rgamma(rpois(1,2000), scale = 0.5, shape = 12)))
   )
sim<-c(sim,sim0[sim0<=500])
remain<-(999 - length(sim))
  }
  sim<-sim[1:999]

Results of a run:

  sum(sim>500)
  [1] 0

  max(sim)
  [1] 4999696

  length(sim)
  [1] 999

It may be on the slow side (though not hugely -- on a quite slow
machine the above run was completed in 2min 5sec, while the
999-replicate in your original took 15sec. So about 8 times as long.
Most of this, of course, is taken up with the first round.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-May-07   Time: 19:18:01
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] to draw a smooth arc

2007-05-01 Thread Ted Harding
This thread prompts me to ask about something I've
been pondering for a while, as to whether there's an
implementation somewhere ticked away in the R resources.

So far, people have been responding to the original query
in terms of increasing the numbers of points, and joining
these by lines.

However, if you're using PostScript output, you can draw
really smooth curves by exploiting PS's "curveto" operator.
This draws a cubic-curve segment in the following way:

The two points you want to join with a curve will be denoted
by (X0,Y0) and (X3,Y3) in the following (for reasons which
will appear). The PS command is of the form

  x1 y1  x2 y2  X3 Y3  curevto

At (X0,Y0) the tangent to the curve (as it departs from (X0,Y0)
is in the direction of the directed line from (X0,Y0) to (x1,y1),
and at (X3,Y3) (as it arrives) the tangent to the curve is
in the direction of the directed line from (x2,y3) to (X3,Y3).

The location of (X0,Y0) is not part of the command, since
it is implicit in the PS "currentpoint" which is the starting
point of the curve.

The result is (in theory, and in practice to within the resolution
of the output device) a perfectly smooth curve, provided the
consecutive cubic segments have the same tangent at each of
the points being joined. This can be achieved by appropriate
choice of the "intermediate" points -- (x1,y2), (x2,y2) above.

So far, when I've done this myself (including when using the
output from R to give the points being joined), I've done the
computation of the "intermediate" points "by hand". This basically
involves deciding, at each of the points being joined, what the
tangent to the smooth curve shouold be.

Of course, there is an element of arbitrariness in this, unless
there is an analytic representation of the curve on which the
points lie (e.g. you're plotting sin(x)/x every pi/8, and
want to join them smoothly), when all you need is the derivatives
at the points.

Crudely, you might evaluate the direction at a point in terms
os a weighted average of the directions to its two immediate
neighbours (the nearer meghbour ges the greater weight); less
crudely, you might fit a quadratic through the point and its
2 neighbours and use the gradient at the middle point; and so on.

Once you've decided on the tangent at each point, it's then
straightforward to compute suitable "intermediate points"
to serve as (x1,y2) and (x2,y2).

(One application where this sort of approach is needed is in
joining computed points on iso-contours, where the individual
points have been determined by interpolation of spot-measurements
at nearby measuring stations).

Anyway. The Question: is there a general function for the
above kind of smooth curve-drawing?

With thanks,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-May-07   Time: 14:50:38
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   >