[R] An entire data frame which is a time-series?

2004-08-17 Thread Ajay Shah
I have :

raw - read.table(monthly.text, skip=3, sep=|,
col.names=c(junk, junk2,
  wpi, g.wpi, wpi.primary, g.wpi.primary,
  wpi.fuel, g.wpi.fuel, wpi.manuf, g.wpi.manuf,
  cpi.iw, g.cpi.iw, cpi.unme, g.cpi.unme,
  cpi.al, g.cpi.al, cpi.rl, g.cpi.rl))

Now I can do things like:

  g.wpi = ts(raw$g.wpi, frequency=12, start=c(1994,7))

and it works fine. One by one, I can make time-series objects.

Is there a way to tell R that the entire data frame is a set of
time-series, so that I don't have to go column by column and make a
new ts() out of each?

I tried:

  M = ts(raw, frequency=12, start=c(1994,7))
  ts.plot(M[,wpi], M[,wpi.manuf])

but this gives nonsense results. Also, syntax like M$wpi is a lot
nicer than M[,wpi]. Any ideas about what might work?


An unrelated suggestion: I found the documentation of ts() to be quite
daunting. I have been around time-series and computer programming for
decades. But it took me a while to handle the basics : to read in a
file, to make time-series vectors, to run ARMA models. This stuff
ought to be easier to learn. I tried to write an ARMA example, and put
it up on the web, which would've been a godsend to me if I had found
it earlier (http://www.mayin.org/~ajayshah/KB/R/tsa.html).

I believe that the R documentation framework could do well by always
having a 2000 word conceptual introduction + little tutorial on each
package, instead of straight jumping into man pages on each function
(which is the only documentation that we have presently).

-- 
Ajay Shah   Consultant
[EMAIL PROTECTED]  Department of Economic Affairs
http://www.mayin.org/ajayshah   Ministry of Finance, New Delhi

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Cross-variograms

2004-08-17 Thread Edzer J. Pebesma
Jacques, provided that X and Y are colocated (i.e., have
exactly the same observation locations), you get the
cross variogram right; the definition of this cross
variogram is however:
gamma(h)= E[(X(s)-X(s+h))*(Y(s)-Y(s+h))]
also, where you select:
cv - v$gamma[1:14]
you may be better off using the more general
v$gamma[v$id == X.Y]
Best regards,
--
Edzer
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] strptime() bug? And additional problem in package tseries

2004-08-17 Thread javier garcia - CEBAS
Hi all, I've got some problems with irts objects, one of which could be a bug:

1) Read a table with several columns from Postgres and the first column is 
Timestamp with timezone (this is OK). An extract is:

raincida$ts:
 [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC
 [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC
 [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC
 [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC

2) Try to extract time from this column of the dataframe (bug?)

 lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S)

# An extract is:

 [2038] 2000-03-25 21:00:00 2000-03-25 22:00:00 2000-03-25 23:00:00
 [2041] 2000-03-26 00:00:00 2000-03-26 01:00:00 2000-03-26 03:00:00
 [2044] 2000-03-26 03:00:00 2000-03-26 04:00:00 2000-03-26 05:00:00

# note that element [2043] is wrong. This happens several times in 
# the dataset. This will produce an eventual error because of omitted
# and duplicated values 

3) The additional problem is related with function time() for irts objects.
I try to make an irts from several columns of the table read:

 rain.irts - 
irts(as.POSIXct(lluvia.strptime,tz=GMT),cbind(raincida[[8]],raincida[[9]],raincida[[10]],raincida[[11]],raincida[[12]],raincida[[13]],raincida[[14]]))
 

# this step doesn't seem to have any further problem. An extract is:

2000-03-25 22:00:00 GMT 0.275 0 0.07875 0.2 0 0.025 23.65
2000-03-25 23:00:00 GMT 0.275 0 0.07875 0.2 0 0.025 23.65
2000-03-26 00:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322
2000-03-26 01:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322
2000-03-26 03:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322
2000-03-26 03:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322
2000-03-26 04:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322

# But I try to extract the time part:

time(rain.irts, tz='GMT')

# An extract is:

 [2039] 2000-03-25 23:00:00 CET  2000-03-26 00:00:00 CET
 [2041] 2000-03-26 01:00:00 CET  2000-03-26 03:00:00 CEST
 [2043] 2000-03-26 05:00:00 CEST 2000-03-26 05:00:00 CEST

# There isn't a way for this time to be shown as 'GMT'? I guess sometimes it 
is shown as 'CET' and other times as 'CEST' depending of the lag between the 
locale and gmt (utc) times. But for me this is an additional problem as the 
output shows one or two hours more that UTC time.

Thanks all, and best regards,

Javier G.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


NO bug in Re: [R] strptime() bug? And additional problem in package tseries

2004-08-17 Thread Prof Brian Ripley
There is no bug in R here.  There was a change to DST in Spain at 2am on
2000-03-26, and they are *printed* as times in your locale, as documented.

Please read the posting guide and FAQ about what is a bug.
Also, please try not to confuse an object and its printed representation.

On Tue, 17 Aug 2004, javier garcia - CEBAS wrote:

 Hi all, I've got some problems with irts objects, one of which could be a bug:
 
 1) Read a table with several columns from Postgres and the first column is 
 Timestamp with timezone (this is OK). An extract is:
 
 raincida$ts:
  [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC
  [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC
  [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC
  [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC
 
 2) Try to extract time from this column of the dataframe (bug?)
 
  lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S)
 
 # An extract is:

NO!  That is an extract of *printing* lluvia.strptime, which will give you
the times in your current time zone, as documented.

  [2038] 2000-03-25 21:00:00 2000-03-25 22:00:00 2000-03-25 23:00:00
  [2041] 2000-03-26 00:00:00 2000-03-26 01:00:00 2000-03-26 03:00:00
  [2044] 2000-03-26 03:00:00 2000-03-26 04:00:00 2000-03-26 05:00:00
 
 # note that element [2043] is wrong. This happens several times in 
 # the dataset. This will produce an eventual error because of omitted
 # and duplicated values 

I think you want to use as.POSIXct(lluvia.strptime, tz=GMT) to get 
what you may have intended.



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] table and getting rownames

2004-08-17 Thread merser
hi there
say that i have this table
x-table(adoc, oarb)
x
   oarb
  0   1
adoc
ab1   0
am5   1
ba   14   1
cc  271   3
ch   87   2
dz  362   6
fl7   0
fs   84   2

is there an easy way to get the row names or row numbers of rows with
oarb==0
i.e. (ab, fl) or (1, 7)

regards soren

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] table and getting rownames

2004-08-17 Thread Prof Brian Ripley
On Tue, 17 Aug 2004 [EMAIL PROTECTED] wrote:

 say that i have this table
 x-table(adoc, oarb)
 x
oarb
   0   1
 adoc
 ab1   0
 am5   1
 ba   14   1
 cc  271   3
 ch   87   2
 dz  362   6
 fl7   0
 fs   84   2
 
 is there an easy way to get the row names or row numbers of rows with
 oarb==0

That seems to be with *entry* zero, not oarb = 0?

 i.e. (ab, fl) or (1, 7)

rows(x)[x==0]
rownames(x)[rows(x)[x==0]]

will do what I think you meant to ask.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread david_foreman
First, many thanks to Frank Harrell for once again helping me out.  This actually 
relates to the next point, which is my contribution to the 'why don't social 
scientists use R' discussion.  I am a hybrid social scientist(child psychiatrist) who 
trained on SPSS.  Many of my difficulties in coming to terms with R have been to do 
with trying to apply the logic underlying SPSS, with dire results.  You do not want to 
know how long I spent looking for a 'recode' command in R, to change factor names and 
classes.

I think the solution is to combine a graphical interface that encourages command line 
use (such as Rcommander) with the analyse(this) paradigm suggested, but also 
explaining how one can a) display the code on a separate window ('page' is only an 
obvious command once you know it), and b) how one can then save one's modification, 
make it generally available, and not overwrite the unmodified version (again, thanks, 
Frank).  Finally, one would need to change the emphasis in basic statistical teaching 
from 'the right test' to 'the right model'.  That should get people used to R's logic.

If a rabbit starts to use R, s/he is likely to head for the help files associated with 
each function, which can assume that the reader can make sense of gnomic utterances 
like Omit 'var' to impute all variables, creating new variables in 'search' position 
'where'.  I still don't know what that one means (as I don't understand search 
positions, or why they're important).  This can be very offputting, and could lead the 
rabbit to return to familiar SPSS territory.

Finally, friendlier error messages would also help. It took me 3 days, and opening 
every function I could, to work out that '...cannot find function xxx.data.frame...' 
meant that MICE was unable to make a polychotomous logistic imputation model converge 
for the variable immediately preceding it.


I am now off to the help files and FAQs to find out how to change graph parameters, as 
the plot.mids function in MICE a) doesn't allow one to select a subset of variables, 
and b) tells me that the graph it wants to produce on the whole of my 26 variable 
dataset is too big to fit on the (windows) plotting device.  Unless anyone wants to 
tell me how/where? (which of course is why, in the end, R is EASIER to use than SPSS)


-- Original Message --
From: [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
Date:  Sun, 15 Aug 2004 12:10:22 +0200

Send R-help mailing list submissions to
   [EMAIL PROTECTED]

To subscribe or unsubscribe via the World Wide Web, visit
   https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
   [EMAIL PROTECTED]

You can reach the person managing the list at
   [EMAIL PROTECTED]

When replying, please edit your Subject line so it is more specific
than Re: Contents of R-help digest...


Today's Topics:

   1. Re: numerical accuracy, dumb question (Brian Gough)
   2. RE: numerical accuracy, dumb question (Tony Plate)
   3. RE: numerical accuracy, dumb question (Dan Bolser)
   4. Re: extracting datasets from aregImpute objects
  (Frank E Harrell Jr)
   5. RE: numerical accuracy, dumb question (Marc Schwartz)
   6. RE: numerical accuracy, dumb question (Marc Schwartz)
   7. RE: numerical accuracy, dumb question (Prof Brian Ripley)
   8. ROracle connection problem (xianghe yan)
   9. association rules in R (Christoph Lehmann)
  10. R Cookbook ([EMAIL PROTECTED])
  11. RE: numerical accuracy, dumb question (Marc Schwartz)
  12. How to display the equation of ECDF (Yair Benita)
  13. Re: association rules in R (Spencer Graves)
  14. Re: How to display the equation of ECDF (Rolf Turner)
  15. Re: How to display the equation of ECDF (Spencer Graves)
  16. how to draw two graphs in one graph window (Chuanjun Zhang)
  17. Rserve needs (but cannot find) libR.a (or maybe it's .so)
  (Paul Shannon)
  18. Re: Rserve needs (but cannot find) libR.a (or maybe it's .so)
  (A.J. Rossini)
  19. calibration/validation sets (Peyuco Porras Porras .)
  20. RE: calibration/validation sets (Austin, Matt)
  21. Re: calibration/validation sets (Kevin Wang)
  22. RE: calibration/validation sets (Liaw, Andy)
  23. Dirichlet-Multinomial (Z P)
  24. Re: how to draw two graphs in one graph window
  (Adaikalavan Ramasamy)
  25. index and by groups statement (Robert Waters)
  26. Re: index and by groups statement (Adaikalavan Ramasamy)


--

Message: 1
Date: 14 Aug 2004 10:46:31 +0100
From: Brian Gough [EMAIL PROTECTED]
Subject: Re: [R] numerical accuracy, dumb question
To: Dan Bolser [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Message-ID: [EMAIL PROTECTED]

Dan Bolser [EMAIL PROTECTED] writes:

 I store an id as a big number, could this be a problem?

If there are ids with significant leading zeros, or too big to be
represented accurately (2^53)--you won't get any warning about 

Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread Roger D. Peng
I'm just curious, but how do social scientists, or anyone else for 
that matter, learn SPSS, besides taking a class?

-roger
[EMAIL PROTECTED] wrote:
First, many thanks to Frank Harrell for once again helping me out.
This actually relates to the next point, which is my contribution
to the 'why don't social scientists use R' discussion.  I am a
hybrid social scientist(child psychiatrist) who trained on SPSS.
Many of my difficulties in coming to terms with R have been to do
with trying to apply the logic underlying SPSS, with dire results.
You do not want to know how long I spent looking for a 'recode'
command in R, to change factor names and classes.
I think the solution is to combine a graphical interface that
encourages command line use (such as Rcommander) with the
analyse(this) paradigm suggested, but also explaining how one can
a) display the code on a separate window ('page' is only an obvious
command once you know it), and b) how one can then save one's
modification, make it generally available, and not overwrite the
unmodified version (again, thanks, Frank).  Finally, one would need
to change the emphasis in basic statistical teaching from 'the
right test' to 'the right model'.  That should get people used to
R's logic.
If a rabbit starts to use R, s/he is likely to head for the help
files associated with each function, which can assume that the
reader can make sense of gnomic utterances like Omit 'var' to
impute all variables, creating new variables in 'search' position
'where'.  I still don't know what that one means (as I don't
understand search positions, or why they're important).  This can
be very offputting, and could lead the rabbit to return to familiar
SPSS territory.
Finally, friendlier error messages would also help. It took me 3
days, and opening every function I could, to work out that
'...cannot find function xxx.data.frame...' meant that MICE was
unable to make a polychotomous logistic imputation model converge
for the variable immediately preceding it.
I am now off to the help files and FAQs to find out how to change
graph parameters, as the plot.mids function in MICE a) doesn't
allow one to select a subset of variables, and b) tells me that the
graph it wants to produce on the whole of my 26 variable dataset is
too big to fit on the (windows) plotting device.  Unless anyone
wants to tell me how/where? (which of course is why, in the end, R
is EASIER to use than SPSS)
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using nls to fit a four parameter logistic model

2004-08-17 Thread Rogers, James A [PGRD Groton]
Shalini,

I think your hill equation is meant to just be an alternative
parameterization of the four parameter logistic (BTW, the hill
*coefficient* is a function of the slope parameter of the FPL, but I don't
believe hill equation is standard terminology). Note conc is the input
in this parameterization, not log(conc). 

 nls(log(il10)~A+(B-A)/(1+(conc/xmid )^scal),data=test,
+ start = list(A=3.5, B=15,
+   xmid=600,scal=1/2.5))
Nonlinear regression model
  model:  log(il10) ~ A + (B - A)/(1 + (conc/xmid)^scal) 
   data:  test 
  A   Bxmidscal 
 14.7051665   3.7964534 607.9822962   0.3987786 
 residual sum-of-squares:  0.1667462 

To see the equivalence to the other parametrization that you used, note

 1/2.507653
[1] 0.3987793
 log(607.9822962)
[1] 6.410146

--Jim

 Message: 17
 Date: Mon, 16 Aug 2004 11:25:57 -0500
 From: [EMAIL PROTECTED]
 Subject: [R] using nls to fit a four parameter logistic model
 To: [EMAIL PROTECTED]
 Message-ID:
   [EMAIL PROTECTED]
 Content-Type: text/plain; charset=US-ASCII
 
 I am working on what appears to be a fairly simple problem for the
 following data
 
  test=data.frame(cbind(conc=c(25000, 12500, 6250, 3125, 1513, 781, 391,
 195, 97.7, 48.4, 24, 12, 6, 3, 1.5, 0.001),
  il10=c(330269, 216875, 104613, 51372, 26842, 13256, 7255, 3049, 1849,
743,
 480, 255, 241, 128, 103, 50)))
 I am able to fit the above data to the equation
 
  nls(log(il10)~A+(B-A)/(1+exp((xmid-log(conc))/scal)),data=test,
 +  start = list(A=log(0.001), B=log(10),
 + xmid=log(6000),scal=0.8))
 Nonlinear regression model
   model:  log(il10) ~ A + (B - A)/(1 + exp((xmid - log(conc))/scal))
data:  test
 A B  xmid  scal
  3.796457 14.705159  6.410144  2.507653
  residual sum-of-squares:  0.1667462
 
 
 But in attempting to achieve a fit to what is commonly known as the hill
 equation, which is a four parameter fit that is used widely in biological
 data analysis
 
 nls(log(il10)~A+(B-A)/(1+(log(conc)/xmid )^scal),data=test,
 + start = list(A=log(0.001), B=log(10),  xmid=log(6000),scal=0.8))
 
 Nonlinear regression model
   model:  log(il10) ~ A + (B - A)/(1 + (log(conc)/xmid )^scal)
 
 Error in numericDeriv(form[[3]], names(ind), env) :
 Missing value or an Infinity produced when evaluating the model
 
 
 
 Please would someone offer a suggestion
 
 Shalini

James A. Rogers 
Manager, Nonclinical Statistics
PGRD Groton Labs
Eastern Point Road (MS 260-1331)
Groton, CT 06340
office: (860) 686-0786
fax: (860) 715-5445
 


LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Bug in colnames of data.frames?

2004-08-17 Thread Arne Henningsen
Hi,

I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0.

I have a data.frame, e.g.:

 myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) )

If I add a new column by

 myData$var3 - myData[ , var1 ] + myData[ , var2 ]

everything is fine, but if I omit the commas:

 myData$var4 - myData[ var1 ] + myData[ var2 ]

the name shown above the 4th column is not var4:

 myData
  var1 var2 var3 var1
11566
22688
337   10   10
448   12   12

but names() and colnames() return the expected name:

 names( myData )
[1] var1 var2 var3 var4
 colnames( myData )
[1] var1 var2 var3 var4

And it is even worse: I am not able to change the name shown above the 4th 
column:
 names( myData )[ 4 ] - var5
 myData
  var1 var2 var3 var1
11566
22688
337   10   10
448   12   12

I guess that this is a bug, isn't it?

Arne

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bug in colnames of data.frames?

2004-08-17 Thread Uwe Ligges
Arne Henningsen wrote:
Hi,
I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0.
I have a data.frame, e.g.:

myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) )

If I add a new column by

myData$var3 - myData[ , var1 ] + myData[ , var2 ]

everything is fine, but if I omit the commas:

myData$var4 - myData[ var1 ] + myData[ var2 ]
This bug is the user ... ;-)
Type:  str(myData)
`data.frame':   4 obs. of  3 variables:
 $ var1: int  1 2 3 4
 $ var2: int  5 6 7 8
 $ var4:`data.frame':   4 obs. of  1 variable:
  ..$ var1: int  6 8 10 12
Aha! You have created a data.frame consisting of one column! What you 
mean really mean is
 myData$var5 - myData[[ var1 ]] + myData[[ var2 ]]

Uwe Ligges


the name shown above the 4th column is not var4:

myData
  var1 var2 var3 var1
11566
22688
337   10   10
448   12   12
but names() and colnames() return the expected name:

names( myData )
[1] var1 var2 var3 var4
colnames( myData )
[1] var1 var2 var3 var4
And it is even worse: I am not able to change the name shown above the 4th 
column:

names( myData )[ 4 ] - var5
myData
  var1 var2 var3 var1
11566
22688
337   10   10
448   12   12
I guess that this is a bug, isn't it?
Arne
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Fwd: strptime() problem?

2004-08-17 Thread javier garcia - CEBAS
Hi all;
I've already send a similar e-mail to the list and Prof. Brian Ripley 
answered me but my doubts remain unresolved. Thanks for the clarification, 
but perhaps I wasn't clear enough in posting my questions.

I've got a postgres database which I read into R. The first column is
Timestamp with timezone, and my data are already in UTC format. An 'printed' 
extract of R character column, resulting from the timestamptz field is:

raincida$ts:

 [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC
 [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC
 [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC
 [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC

#And I need to convert this character column into POSIXct, for eventual work. 
#As I can see in the documentation, the process is to use strptime(), what 
#creates an object POSIXlt and doesn't allow to specify that the time zone of 
#the data is already UTC; followed by as.POSIXct()

 lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S)
 lluvia.strptime.POSIXct - as.POSIXct(lluvia.strptime,tz=GMT)

A printed extract is:

 [2039] 2000-03-25 22:00:00 GMT 2000-03-25 23:00:00 GMT
 [2041] 2000-03-26 00:00:00 GMT 2000-03-26 01:00:00 GMT
 [2043] 2000-03-26 03:00:00 GMT 2000-03-26 03:00:00 GMT
 [2045] 2000-03-26 04:00:00 GMT 2000-03-26 05:00:00 GMT

As we can see, elements [2043] differ. Shouldn't they be similar as the rest 
of the other shown elements? I thought this was a bug, but it seems that I've 
got and conceptual error.(?). This happens several times in my data, and 
produces eventual errors.

Please, how could I resolved this?

Thanks all, and best regards,

Javier G.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bug in colnames of data.frames?

2004-08-17 Thread Peter Dalgaard
Arne Henningsen [EMAIL PROTECTED] writes:

 Hi,
 
 I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0.
 
 I have a data.frame, e.g.:
 
  myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) )
 
 If I add a new column by
 
  myData$var3 - myData[ , var1 ] + myData[ , var2 ]
 
 everything is fine, but if I omit the commas:
 
  myData$var4 - myData[ var1 ] + myData[ var2 ]
 
 the name shown above the 4th column is not var4:
 
  myData
   var1 var2 var3 var1
 11566
 22688
 337   10   10
 448   12   12
 
 but names() and colnames() return the expected name:
 
  names( myData )
 [1] var1 var2 var3 var4
  colnames( myData )
 [1] var1 var2 var3 var4
 
 And it is even worse: I am not able to change the name shown above the 4th 
 column:
  names( myData )[ 4 ] - var5
  myData
   var1 var2 var3 var1
 11566
 22688
 337   10   10
 448   12   12
 
 I guess that this is a bug, isn't it?

Nope:

 str(myData)
`data.frame':   4 obs. of  4 variables:
 $ var1: int  1 2 3 4
 $ var2: int  5 6 7 8
 $ var3: int  6 8 10 12
 $ var4:`data.frame':   4 obs. of  1 variable:
  ..$ var1: int  6 8 10 12

It's slightly peculiar, but if a column of a data frame is itself a
rectangular structure (data frame or matrix), then the innermost names
are used. Cf.

 myData[,var4] - cbind(xyzzy=5:2)
 myData
  var1 var2 var3 xyzzy
1156 5
2268 4
337   10 3
448   12 2


Arguably, one might prefer

  var1 var2 var3  var4
 xyzzy
1156 5
2268 4
337   10 3
448   12 2

or something like that, but it's hardly a bug.


-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bug in colnames of data.frames? -- NOT

2004-08-17 Thread Prof Brian Ripley
This is not a bug, and BTW data frames have names not colnames.
As I have said already today, don't confuse the printed repesentation of 
an object with the object itself.

On Tue, 17 Aug 2004, Arne Henningsen wrote:

 I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0.
 
 I have a data.frame, e.g.:
 
  myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) )
 
 If I add a new column by
 
  myData$var3 - myData[ , var1 ] + myData[ , var2 ]
 
 everything is fine, but if I omit the commas:
 
  myData$var4 - myData[ var1 ] + myData[ var2 ]
 
 the name shown above the 4th column is not var4:
 
  myData
   var1 var2 var3 var1
 11566
 22688
 337   10   10
 448   12   12
 
 but names() and colnames() return the expected name:
 
  names( myData )
 [1] var1 var2 var3 var4
  colnames( myData )
 [1] var1 var2 var3 var4
 
 And it is even worse: I am not able to change the name shown above the 4th 
 column:
  names( myData )[ 4 ] - var5
  myData
   var1 var2 var3 var1
 11566
 22688
 337   10   10
 448   12   12
 
 I guess that this is a bug, isn't it?

No.  Take a look at the fourth column more carefully.

 myData[4]
  var1
16
28
3   10
4   12

 class(myData[4])
[1] data.frame

You included a single-column data frame in your data frame.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] levels of factor

2004-08-17 Thread Luis Rideau Cruz
R-help,

I have a data frame wich I subset like :

a - subset(df,df$column2 %in% c(factor1,factor2)   df$column2==1)

But when I type levels(a$column2) I still get the same levels as in df (my original 
data frame)

Why is that?
Is it right?

Luis

Luis Ridao Cruz
Fiskirannsóknarstovan
Nóatún 1
P.O. Box 3051
FR-110 Tórshavn
Faroe Islands
Phone: +298 353900
Phone(direct): +298 353912
Mobile: +298 580800
Fax: +298 353901
E-mail:  [EMAIL PROTECTED]
Web:www.frs.fo

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] using nls to fit a four parameter logistic model

2004-08-17 Thread Spencer Graves
 In your second model, log(conc) is negative for conc = 0.001.  
This observation will generate NA for (log(conc)/xmid)^scal unless scal 
is an integer or xmid is also negative.  In the latter case, 
(log(conc)/xmid)^scal will be NA for all but that last value unless scal 
is an integer. 

 What do your biological references do with this model for 
concentrations less than 1? 

 If you delete that observation, the algorithm can still die 
testing a value for xmid = 0.  To avoid these cases, I routine 
parameterize problems like this in terms of ln.xmid, something like the 
following: 
   
   log(il10)~A+(B-A)/(1+(log(conc)/exp(ln.xmid))^scal). 

 hope this helps.  spencer graves
[EMAIL PROTECTED] wrote:
Shalini Raghavan
3M Pharmaceuticals Research
Building 270-03-A-10, 3M Center
St. Paul, MN  55144
E-mail: [EMAIL PROTECTED]
Tel:  651-736-2575
Fax:  651-733-5096
- Forwarded by Shalini Raghavan/US-Corporate/3M/US on 08/16/2004 11:25
AM -
  
Shalini   
Raghavan/US-Corpo 
rate/3M/US To 
  [EMAIL PROTECTED]   
08/16/2004 08:57   cc 
AM
  Subject 
  Fw: using nls to fit a four 
  parameter logistic model
  
  
  
  
  
  




I am working on what appears to be a fairly simple problem for the
following data
test=data.frame(cbind(conc=c(25000, 12500, 6250, 3125, 1513, 781, 391,
195, 97.7, 48.4, 24, 12, 6, 3, 1.5, 0.001),
il10=c(330269, 216875, 104613, 51372, 26842, 13256, 7255, 3049, 1849, 743,
480, 255, 241, 128, 103, 50)))
 

test
   

   conc   il10
1  25000.000 330269
2  12500.000 216875
3   6250.000 104613
4   3125.000  51372
5   1513.000  26842
6781.000  13256
7391.000   7255
8195.000   3049
9 97.700   1849
1048.400743
1124.000480
1212.000255
13 6.000241
14 3.000128
15 1.500103
16 0.001 50
I am able to fit the above data to the equation
 

nls(log(il10)~A+(B-A)/(1+exp((xmid-log(conc))/scal)),data=test,
   

+  start = list(A=log(0.001), B=log(10),
+ xmid=log(6000),scal=0.8))
Nonlinear regression model
 model:  log(il10) ~ A + (B - A)/(1 + exp((xmid - log(conc))/scal))
  data:  test
   A B  xmid  scal
3.796457 14.705159  6.410144  2.507653
residual sum-of-squares:  0.1667462
But in attempting to achieve a fit to what is commonly known as the hill
equation, which is a four parameter fit that is used widely in biological
data analysis
nls(log(il10)~A+(B-A)/(1+(log(conc)/xmid )^scal),data=test,
+ start = list(A=log(0.001), B=log(10),  xmid=log(6000),scal=0.8))
Nonlinear regression model
 model:  log(il10) ~ A + (B - A)/(1 + (log(conc)/xmid )^scal)
Error in numericDeriv(form[[3]], names(ind), env) :
   Missing value or an Infinity produced when evaluating the model

Please would someone offer a suggestion
Shalini
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bug in colnames of data.frames?

2004-08-17 Thread Marc Schwartz
On Tue, 2004-08-17 at 09:01, Arne Henningsen wrote:
 Hi,
 
 I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0.
 
 I have a data.frame, e.g.:
 
  myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) )
 
 If I add a new column by
 
  myData$var3 - myData[ , var1 ] + myData[ , var2 ]
 
 everything is fine, but if I omit the commas:
 
  myData$var4 - myData[ var1 ] + myData[ var2 ]
 
 the name shown above the 4th column is not var4:
 
  myData
   var1 var2 var3 var1
 11566
 22688
 337   10   10
 448   12   12
 
 but names() and colnames() return the expected name:
 
  names( myData )
 [1] var1 var2 var3 var4
  colnames( myData )
 [1] var1 var2 var3 var4
 
 And it is even worse: I am not able to change the name shown above the 4th 
 column:
  names( myData )[ 4 ] - var5
  myData
   var1 var2 var3 var1
 11566
 22688
 337   10   10
 448   12   12
 
 I guess that this is a bug, isn't it?
 
 Arne


Here is a hint:

# This returns an integer vector
 str(myData[ , var1 ] + myData[ , var2 ])
 int [1:4] 6 8 10 12


# This returns a data.frame
 str(myData[ var1 ] + myData[ var2 ])
`data.frame':   4 obs. of  1 variable:
 $ var1: int  6 8 10 12


 str(myData)
`data.frame':   4 obs. of  5 variables:
 $ var1: int  1 2 3 4
 $ var2: int  5 6 7 8
 $ var3: int  6 8 10 12
 $ var4:`data.frame':   4 obs. of  1 variable:
  ..$ var1: int  6 8 10 12


Take a look at the details, value and coercion sections of ?.data.frame

HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bug in colnames of data.frames?

2004-08-17 Thread Marc Schwartz
On Tue, 2004-08-17 at 09:34, Marc Schwartz wrote:

 Take a look at the details, value and coercion sections of
 ?.data.frame

This must be my week for typos. That should be:

?[.data.frame (in ESS)

or

?[.data.frame (otherwise)

Marc

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] levels of factor

2004-08-17 Thread Marc Schwartz
On Tue, 2004-08-17 at 09:30, Luis Rideau Cruz wrote:
 R-help,
 
 I have a data frame wich I subset like :
 
 a - subset(df,df$column2 %in% c(factor1,factor2)   df$column2==1)
 
 But when I type levels(a$column2) I still get the same levels as in df (my 
 original data frame)
 
 Why is that?

The default for [.factor is:

x[i, drop = FALSE]

Hence, unused factor levels are retained.

 Is it right?

Yes.

If you want to explicitly recode the factor based upon only those levels
that are actually in use, you can do something like the following:

a - factor(a)


However, I am a bit unclear as to the logic of the subset statement that
you are using, perhaps b/c I don't know what your data is.

You seem to be subsetting 'column2' on both the factor levels and a
presumed numeric code. Is that really what you want to do?

You might want to review the Warning section in ?factor

BTW, when using subset(), the evaluation takes place within the data
frame, so you do not need to use df$column2 in the function call. You
can just use column2, for example:

subset(df, column2 %in% c(factor1, factor2))

See ?factor and ?[.factor for more information.

HTH,

Marc Schwartz

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Fwd: strptime() problem?

2004-08-17 Thread Gabor Grothendieck
javier garcia - CEBAS rn001 at cebas.csic.es writes:

: 
: Hi all;
: I've already send a similar e-mail to the list and Prof. Brian Ripley 
: answered me but my doubts remain unresolved. Thanks for the clarification, 
: but perhaps I wasn't clear enough in posting my questions.
: 
: I've got a postgres database which I read into R. The first column is
: Timestamp with timezone, and my data are already in UTC format. An 'printed' 
: extract of R character column, resulting from the timestamptz field is:
: 
: raincida$ts:
: 
:  [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC
:  [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC
:  [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC
:  [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC
: 
: #And I need to convert this character column into POSIXct, for eventual 
work. 
: #As I can see in the documentation, the process is to use strptime(), what 
: #creates an object POSIXlt and doesn't allow to specify that the time zone 
of 
: #the data is already UTC; followed by as.POSIXct()
: 
:  lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S)
:  lluvia.strptime.POSIXct - as.POSIXct(lluvia.strptime,tz=GMT)
: 
: A printed extract is:
: 
:  [2039] 2000-03-25 22:00:00 GMT 2000-03-25 23:00:00 GMT
:  [2041] 2000-03-26 00:00:00 GMT 2000-03-26 01:00:00 GMT
:  [2043] 2000-03-26 03:00:00 GMT 2000-03-26 03:00:00 GMT
:  [2045] 2000-03-26 04:00:00 GMT 2000-03-26 05:00:00 GMT
: 
: As we can see, elements [2043] differ. Shouldn't they be similar as the rest 
: of the other shown elements? I thought this was a bug, but it seems that 
I've 
: got and conceptual error.(?). This happens several times in my data, and 
: produces eventual errors.
: 
: Please, how could I resolved this?

[Sorry if this gets posted twice.  I had a problem posting and
not sure if the first one ever got sent.]

I am in a different time zone, EDT, on Windows XP and can't
replicate this but you might try reading the latest R News
article on dates and times for some ideas, viz. page 32 of:

   http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pd

In particular, try converting the datetimes to chron and then doing 
your manipulations in chron or else converting them from chron to 
POSIXct rather than going through POSIXlt:

   require(chron)
   r.asc - raincida$ts
   r.chron - chron(substring(r.asc, 1, 10), 
 substring(r.asc, 12, 19), format = c(d/m/y, h:m:s))

   r.ct - as.POSIXct(r.chron)
   format(r.ct, tz=GMT) # display in GMT

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Fwd: strptime() problem?

2004-08-17 Thread Whit Armstrong
Javier,

I recently had a problem with dates.  This example might shed some light on
your problem.

 x - ISOdate(rep(2000,2),rep(3,2),rep(26,2),hour=0)
 x

[1] 2000-03-26 GMT 2000-03-26 GMT

 unclass(x)

[1] 954028800 954028800
attr(,tzone)
[1] GMT

When one creates a date with ISOdate, the resulting object is of class
POSIXct and is given the attribute tzone which is set to GMT.

When one prints an object of class POSIXct the function print.POSIXct is
called:
 print.POSIXct
function (x, ...) 
{
print(format(x, usetz = TRUE, ...), ...)
invisible(x)
}
environment: namespace:base
 

So, that function is just calling format which gets dispatched to
format.POSIXct:

 format.POSIXct
function (x, format = , tz = , usetz = FALSE, ...) 
{
if (!inherits(x, POSIXct)) 
stop(wrong class)
if (missing(tz)  !is.null(tzone - attr(x, tzone))) 
tz - tzone
structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, 
...), names = names(x))
}
environment: namespace:base
 

Now, if one looks carefully at this code, you will see that it tests for the
attribute tzone on the object that is passed in.  If it finds that
attribute, then it is passed on to format.POSIXlt (which is the function
that ultimately does the printing).  If there is no tzone attribute, then
 is passed to format.POSIXlt as the tzone, which causes the object to be
printed in your locale specific format.

See:

 attr(x,tzone) - 
 x
[1] 2000-03-25 19:00:00 Eastern Standard Time 2000-03-25 19:00:00 Eastern
Standard Time
 attr(x,tzone) - GMT
 x
[1] 2000-03-26 GMT 2000-03-26 GMT
 

Now this is the part that really got me confused:

 x
[1] 2000-03-26 GMT 2000-03-26 GMT
 x[1]
[1] 2000-03-25 19:00:00 Eastern Standard Time
 

What happens in the above case is that the code for [.POSIXct looks like
this:

 get([.POSIXct)
function (x, ..., drop = TRUE) 
{
cl - oldClass(x)
class(x) - NULL
val - NextMethod([)
class(val) - cl
val
}
environment: namespace:base
 

The attribute tzone is not preserved!!  when val is created from the
call to NextMethod, its class is restored, but not its tzone attribute.
So any dates of class POSIXct that are printed after they have been
subscripted ([) will have their tzone attribute stripped, and will print
in the local specific format.

For your specific case, I would convert all my dates to POSIXct, then set
the attribute tzone to GMT.  After that, be very careful when
subscripting them, or you will find them printing in local specific formats
again.

for you:
 y - strptime(4/3/2000,format=%m/%d/%Y)
 y
[1] 2000-04-03
 y - as.POSIXct(y,GMT)
 y
[1] 2000-04-03 GMT
 unclass(y)
[1] 95472
attr(,tzone)
[1] GMT
 

I think that should straighten out your problem.

Hope that helps,
Whit

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re: Thanks Frank, setting graph parameters,and why social scientists don't use R

2004-08-17 Thread Berton Gunter
A few comments:

First, your remarks are interesting and, I would say, mainly well founded. However, I 
think they are in many respects irrelevant, although they do point to the much bigger 
underlying issue, which Roger Peng also hinted at in his reply.

I think they are sensible because R IS difficult; the documentation is often 
challenging, which is not surprising given (a) the inherent complexity of R; (b) the 
difficulty in writing good documentation, especially when many of the functions being 
documented are inherently technical, so subject matter knowledge (CS, statistics, 
numerical analysis ,...) must be assumed; (c) the documentation has been written by a 
variety of mostly statistical types as a sidelight of their main professional 
activities -- none of these writers are ** professional documenters ** (whatever that 
may mean)
and some of them even speak ENglish as a second or third language. My own take is that 
the documentation for Core R and many of the packages is remarkably well done given 
these realities, and my hat is off to those who have produced it. Nevertheless, I 
agree, it is challenging -- it MUST be.

But they are irrelevant because the fundamental issue **is** that there is an inherent 
tension between ease of use and power/flexibility. Writing good GUI's for anything is 
hard, very hard. For a project such as R, it doesn't make sense, although it may to 
write GUI's for small subsets of R targeted at specific audiences (as in BioConductor, 
RCommander, etc.). But even this is hard to do well and takes a lot of time and 
effort. So, IMHO, there never will be nor ever should/could be an overall GUI for R: 
it is too complex and needs to be too extensible and flexible to constrain it in
that way.

However, I believe the larger question that both you and Roger Peng hint at is more 
important: not How does a social scientist learn to use R, but how does any 
scientist/technologist for whom experimental design and data analysis forms a large 
component of their work gain the necessary technical background in statistics and 
related disciplines (linear algebra, numerical analysis, ...) to ** know how to use 
the statistical tools they need that R provides.**  Software like SPSS must assume a 
limited collection of methods to present to their customers in an effective GUI. Their 
strategy
**must** be (this is NOT a criticism) to dumb it down so that they can provide 
coherent albeit limited data analysis strategies. As you have explicitly stated, users 
who wish to venture outside those narrow paradigms are simply out of luck. R was 
designed from the outset not to be so constrained, but the cost is that you must know 
a good deal to use it effectively. It is obvious from the questions posted to this 
list that even something as simple as lm() often demands from users technical 
statistical understanding far beyond what they have. So we see fairly frequently 
indications
of misunderstanding and confusion in using R. But the problem isn't R -- it's that 
users don't know enough statistics.

I wish I could say I had an answer for this, but I don't have a clue. I do not thing 
it's fair to expect a mechnical engineer or psychologist or biologist to have the 
numerous math and statistical courses and experience in their training that would 
provide the base they need. For one thing, they don't have the time in their studies 
for this; for another, they may not have the background or interest -- they are, after 
all, mechanical engineers or biologists, not statisticians. Unfortunately, they could 
do their jobs as engineers and scientists a lot better if they did know more
statistics.  To me, it's a fundamental conundrum, and no one is to blame. It's just 
the reality, but it is the source for all kinds of frustrations on both sides of the 
statistical divide, which both you and Roger expressed in your own ways.

Obviously, all of this is just personal ranting, so I would love to hear alternative 
views. An thanks again for your clear and interesting comments.

Cheers,
Bert

[EMAIL PROTECTED] wrote:

 First, many thanks to Frank Harrell for once again helping me out.  This actually 
 relates to the next point, which is my contribution to the 'why don't social 
 scientists use R' discussion.  I am a hybrid social scientist(child psychiatrist) 
 who trained on SPSS.  Many of my difficulties in coming to terms with R have been to 
 do with trying to apply the logic underlying SPSS, with dire results.  You do not 
 want to know how long I spent looking for a 'recode' command in R, to change factor 
 names and classes.

 I think the solution is to combine a graphical interface that encourages command 
 line use (such as Rcommander) with the analyse(this) paradigm suggested, but also 
 explaining how one can a) display the code on a separate window ('page' is only an 
 obvious command once you know it), and b) how one can then save one's modification, 
 make it generally available, and 

Re: [R] An entire data frame which is a time-series?

2004-08-17 Thread Gabor Grothendieck
Ajay Shah ajayshah at mayin.org writes:

: 
: I have :
: 
: raw - read.table(monthly.text, skip=3, sep=|,
: col.names=c(junk, junk2,
:   wpi, g.wpi, wpi.primary, g.wpi.primary,
:   wpi.fuel, g.wpi.fuel, wpi.manuf, g.wpi.manuf,
:   cpi.iw, g.cpi.iw, cpi.unme, g.cpi.unme,
:   cpi.al, g.cpi.al, cpi.rl, g.cpi.rl))
: 
: Now I can do things like:
: 
:   g.wpi = ts(raw$g.wpi, frequency=12, start=c(1994,7))
: 
: and it works fine. One by one, I can make time-series objects.
: 
: Is there a way to tell R that the entire data frame is a set of
: time-series, so that I don't have to go column by column and make a
: new ts() out of each?
: 
: I tried:
: 
:   M = ts(raw, frequency=12, start=c(1994,7))
:   ts.plot(M[,wpi], M[,wpi.manuf])
: 
: but this gives nonsense results. 

Converting a data frame to a ts object seems to work for me:

R my.df - data.frame(a = 1:4, b = 5:8)
R my.ts - ts(my.df, start=c(2000,4), freq=12)
R my.ts.a - my.ts[,a]
R my.ts.a
 Apr May Jun Jul
2000   1   2   3   4

Suggest you provide a small reproduceable example that illustrates
the problem.


: Also, syntax like M$wpi is a lot
: nicer than M[,wpi]. Any ideas about what might work?

R $.ts - function(x, i) x[,i]
R my.ts$a
 Apr May Jun Jul
2000   1   2   3   4

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] aov summary to matrix

2004-08-17 Thread Moises Hassan
Is there an easy way of converting an aov.summary into a matrix in which
the rows are the factor names and the columns are Df, Sum Sq, Mean Sq, F
value and Pr. 

For example, convert 

Df Sum Sq Mean Sq F value   Pr(F)   
block5 343.29   68.66  4.4467 0.015939 * 
N1 189.28  189.28 12.2587 0.004372 **
P1   8.408.40  0.5441 0.474904   
K1  95.20   95.20  6.1657 0.028795 * 
N:P  1  21.28   21.28  1.3783 0.263165   
N:K  1  33.14   33.14  2.1460 0.168648   
P:K  1   0.480.48  0.0312 0.862752   
Residuals   12 185.29   15.44
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1


To

Factor  Df Sum Sq  Mean Sq F value   Pr
block5 343.29   68.66  4.4467 0.015939
N1 189.28  189.28 12.2587 0.004372
P1   8.408.40  0.5441 0.474904   
K1  95.20   95.20  6.1657 0.028795
N:P  1  21.28   21.28  1.3783 0.263165   
N:K  1  33.14   33.14  2.1460 0.168648   
P:K  1   0.480.48  0.0312 0.862752   
Residuals   12 185.29   15.44NA  NA


Thanks,
   - Moises

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aov summary to matrix

2004-08-17 Thread Prof Brian Ripley
On Tue, 17 Aug 2004, Moises Hassan wrote:

 Is there an easy way of converting an aov.summary into a matrix in which
 the rows are the factor names and the columns are Df, Sum Sq, Mean Sq, F
 value and Pr. 

You are confusing the printed representation with the object (which seems 
today's favourite misconception).

as.matrix(summary(npk.aov)[[1]])

is a matrix (to full precision) as you seek, although I would prefer to 
work with the data frame which is returned.

(Note: your output is from MASS4  example(aov), unattributed.)

 For example, convert 
 
 Df Sum Sq Mean Sq F value   Pr(F)   
 block5 343.29   68.66  4.4467 0.015939 * 
 N1 189.28  189.28 12.2587 0.004372 **
 P1   8.408.40  0.5441 0.474904   
 K1  95.20   95.20  6.1657 0.028795 * 
 N:P  1  21.28   21.28  1.3783 0.263165   
 N:K  1  33.14   33.14  2.1460 0.168648   
 P:K  1   0.480.48  0.0312 0.862752   
 Residuals   12 185.29   15.44
 ---
 Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
 
 
 To
 
 Factor  Df Sum Sq  Mean Sq F value   Pr
 block5 343.29   68.66  4.4467 0.015939
 N1 189.28  189.28 12.2587 0.004372
 P1   8.408.40  0.5441 0.474904   
 K1  95.20   95.20  6.1657 0.028795
 N:P  1  21.28   21.28  1.3783 0.263165   
 N:K  1  33.14   33.14  2.1460 0.168648   
 P:K  1   0.480.48  0.0312 0.862752   
 Residuals   12 185.29   15.44NA  NA

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] survdiff

2004-08-17 Thread Peter Dalgaard
Krista Haanstra [EMAIL PROTECTED] writes:

 As I am quitte an ignorant user of R, excuse me for any wrongfull usage of
 all the terms.
 My question relates to the statistics behind the survdiff function in the
 package survival.
 My textbook knowledge of the logrank test tells me that if I want to compare
 two survival curves, I have to take the sum of the factors: (O-E)^2/E of
 both groups, which will give me the Chisq.
 If I calculate this by hand, I get a different value than the one R is
 giving me.
 Actually, the (O-E)^2/E that R gives me, those I agree with, but if I then
 take the sum, this is not the chisq R gives.
 Two questions:
 - How is Chisq calculated in R?
 - What does the column (O-E)^2/V mean? What is V, and how does this possibly
 relate to the calculated Chisq?

You really need to read a theory book for this, but here's the basic idea:

V is the theoretical variance of O-E for the first group. If O-E is
approximately normally distributed, as it will be in large samples,
then (O-E)^2/V will be approximately chi-squared distributed on 1 DF.

In *other* models, notably those for contingency tables, the same idea
works out as the familiar sum((O-E)^2/E) formula. That formula has
historically been used for the logrank test too, and it still appears
in some textbooks, but as it turns out, it is not actually correct
(although often quite close).

[To fix ideas, consider testing for a given p in the binomial
distribution, you can either say O=x E=np V=npq and get

chisq = (x-np)^2/npq 

or have O = (x, n-x), E = (np, nq) and get

chisq =  (x-np)^2/np + ((n-x) - nq)^2/nq

and a little calculus show that the latter expression is

 = (x-np)^2*(1/np + 1/nq) = (x-np)^2 * (p+q)/npq

so the two formulas are one and the same. In this case!]
-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] aov summary to matrix

2004-08-17 Thread Sundar Dorai-Raj

Moises Hassan wrote:
Is there an easy way of converting an aov.summary into a matrix in which
the rows are the factor names and the columns are Df, Sum Sq, Mean Sq, F
value and Pr. 

For example, convert 

Df Sum Sq Mean Sq F value   Pr(F)   
block5 343.29   68.66  4.4467 0.015939 * 
N1 189.28  189.28 12.2587 0.004372 **
P1   8.408.40  0.5441 0.474904   
K1  95.20   95.20  6.1657 0.028795 * 
N:P  1  21.28   21.28  1.3783 0.263165   
N:K  1  33.14   33.14  2.1460 0.168648   
P:K  1   0.480.48  0.0312 0.862752   
Residuals   12 185.29   15.44
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

To
Factor  Df Sum Sq  Mean Sq F value   Pr
block5 343.29   68.66  4.4467 0.015939
N1 189.28  189.28 12.2587 0.004372
P1   8.408.40  0.5441 0.474904   
K1  95.20   95.20  6.1657 0.028795
N:P  1  21.28   21.28  1.3783 0.263165   
N:K  1  33.14   33.14  2.1460 0.168648   
P:K  1   0.480.48  0.0312 0.862752   
Residuals   12 185.29   15.44NA  NA


Try this:
example(aov)
as.data.frame.summary.aovlist - function(x) {
  if(length(x) == 1) {
as.data.frame(x[[1]])
  } else {
lapply(unlist(x, FALSE), as.data.frame)
  }
}
x1 - summary(npk.aov)
x2 - summary(npk.aovE)
as.data.frame(x1)
as.data.frame(x2)
--sundar
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Bug in colnames of data.frames?

2004-08-17 Thread David Forrest
On Tue, 17 Aug 2004, Arne Henningsen wrote:

 Thank you for all your answers!

 I agree with you that it is not a bug. My mistake was that I thought that a
 data frame is similar to a matrix, but as ?data.frame says they ... share
 many of the properties of matrices and of lists.
...

 I think the current presentation
  myData
   var1 var2 var3 xyzzy
 1156 5
 2268 4
 337   10 3
 448   12 2

 is confusing because it is not directly (without another command like str())
 apparent why myData[[ var1 ]] works fine while myData[[ xyzzy ]] does
 not.

In some ways it is a bug -- in the documentation, print.data.frame, or
format.data.frame

Consider assigning a wider dataframe to var4:

myData-data.frame(matrix(1:12,4),var4=I(data.frame(xyzzy=5:2,plugh=1:4)))
myData  # error
class(myData[[var4]])-data.frame
myData  # gives indications of sub-variables by var.xyzzy, var.plugh
myData[[var4.plugh]]  # NULL
myData[[var4]][[plugh]]

str(myData)

By the way, is there a way of making such an assignment in one step
without the I() class() hack?

dave
-- 
 Dave Forrest
 [EMAIL PROTECTED](804)684-7900w
 [EMAIL PROTECTED] (804)642-0662h
   http://maplepark.com/~drf5n/

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread John
On Tuesday 17 August 2004 06:14, Roger D. Peng wrote:
 I'm just curious, but how do social scientists, or anyone else for
 that matter, learn SPSS, besides taking a class?

They sit down with a book, a computer, and data they desperately need to 
analyze and start working.  SPSS documentation and some of the third party 
works are fairly thorough, and pretty gentle, and the writings fits the 
expectations of someone who has had only the initiatory stats courses.  Your 
teacher emphasizes checking the normality of the data, so you look for the 
means of measuring it and the tests that tell you whether it is significant 
or not, after very carefully considering the nature of your data in the light 
of the assumptions made in the SPSS tests make.  You are far less concerned 
with the real mathematical mechanics than you are about meeting the 
expectations of the professor.  SPSS, SYSTAT, NCSS and similar programs all 
support this kind work.  Many social science professors don't really know 
enough to judge your work beyond similar expectations THEY learned from their 
own professors.  It's sad, but the way it works in many schools.

J

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R

2004-08-17 Thread John
On Tuesday 17 August 2004 09:20, Berton Gunter wrote:
 A few comments:

It has been decades since I used SPSS.  At that time, to really work with it 
you edited a text file program that identified the data file and variable 
columns you wanted to work with.  You assembled the flow of work commands 
after carefully going through the SPSS documentation.  After you were ready, 
you ran the program and crossed your fingers.  R IS complex, enough so that 
the useability at a basic level is readily achievable.  What it lacks is 
simply the Stat 1 and Stat 101 packages that lead users from the very basics 
covered in introductory statistics texts into more profound analyses that 
some many R users are interested in.  There are some texts, such as Peter 
Daalgard's Introductory Statistics with R, which is a very useful book.  
However, from a student's view point Chapter 1 focuses on R, everything from 
the R Language to R programming.  The statistics chapters that follow almost 
seem to be used as an adjunct to teaching R rather than vice versa.  For some 
social science students, a package that leads more gradually into R would 
probably be a big help learning learning the language while getting their 
feet wet in statistics.

John

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] logistic -normal model

2004-08-17 Thread syl dan
I am working with a logistic-normal model (i.e, GLMM with random intercept 
model) by Bayesian method. BUt I met some difficulities for programming by 
R. Is there anyone have experience of this  model or the R code I can refer 
as example?

Thanks for your help.
Syl
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] creating a plot

2004-08-17 Thread Konrad Banachewicz
Hi,
I have a time series plot to produce, yet I want the x-axis to be 
labelled with dates
(stored on another array) and not with observation numbers. Can anyone 
suggest me how?
Thanks.


  Konrad

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] creating a plot

2004-08-17 Thread Tomas Aragon
--- Konrad Banachewicz [EMAIL PROTECTED] wrote:

 Hi,
 I have a time series plot to produce, yet I want the x-axis to be 
 labelled with dates
 (stored on another array) and not with observation numbers. Can
 anyone 
 suggest me how?
 Thanks.
 
  

Konrad
 
Try checking out http://www.medepi.net/data/wnv/index.html at bottom of
page. 
Tomas

=
Tomas Aragon, MD, DrPH, Director
Center for Infectious Disease Preparedness
UC Berkeley School of Public Health
1918 University Ave., 4th Fl., MC-7350
Berkeley, CA 94720-7350
http://www.idready.org

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html