Re: [R] Extracting values from Surv function in survival package

2024-05-16 Thread Göran Broström

Hi Dennis,

look at the help page for summary.survfit, the Value n.event.

Göran

On 2024-05-15 22:41, Dennis Fisher wrote:

OS X
R 4.3.3

Colleagues

I have created objects using the Surv function in the survival package:

FIT.1

Call: survfit(formula = FORMULA1)

n events median 0.95LCL 0.95UCL
SUBDATA$ARM=1, SUBDATA[, EXP.STRAT]=0 18 13345 156  NA
SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=1 13  5 NA 186  NA
SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=2  5  5168  81  NA
SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=3  1  1 22  NA  NA

I am interested in extracting the “n” and “events” values.
“n” is easy:

FIT.1[[1]]

[1] 18 13  5  1

or

FIT.1$n

[1] 18 13  5  1

But I can’t figure out how to access “events”.

str(FIT.1) provides no insights:
List of 17
  $ n: int [1:4] 18 13 5 1
  $ time : num [1:37] 45 106 107 124 152 156 170 176 319 371 ...
  $ n.risk   : num [1:37] 18 17 16 15 14 13 12 11 10 9 ...
  $ n.event  : num [1:37] 1 1 1 1 1 1 1 1 1 1 ...
  $ n.censor : num [1:37] 0 0 0 0 0 0 0 0 0 0 ...
  $ surv : num [1:37] 0.944 0.889 0.833 0.778 0.722 ...
  $ std.err  : num [1:37] 0.0572 0.0833 0.1054 0.126 0.1462 ...
  $ cumhaz   : num [1:37] 0.0556 0.1144 0.1769 0.2435 0.315 ...
  $ std.chaz : num [1:37] 0.0556 0.0809 0.1022 0.1221 0.1414 ...
  $ strata   : Named int [1:4] 18 13 5 1
   ..- attr(*, "names")= chr [1:4] "SUBDATA$ARM=1, SUBDATA[, EXP.STRAT]=0" "SUBDATA$ARM=2, SUBDATA[, 
EXP.STRAT]=1" "SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=2" "SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=3"
  $ type : chr "right"
  $ logse: logi TRUE
  $ conf.int : num 0.95
  $ conf.type: chr "log"
  $ lower: num [1:37] 0.844 0.755 0.678 0.608 0.542 ...
  $ upper: num [1:37] 1 1 1 0.996 0.962 ...
  $ call : language survfit(formula = FORMULA1)
  - attr(*, "class")= chr "survfit"

If I could access:
n events median 0.95LCL 0.95UCL
SUBDATA$ARM=1, SUBDATA[, EXP.STRAT]=0 18 13345 156  NA
SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=1 13  5 NA 186  NA
SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=2  5  5168  81  NA
SUBDATA$ARM=2, SUBDATA[, EXP.STRAT]=3  1  1 22  NA  NA
it should be easy to get “events”.

Any thoughts?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating methods

2023-04-24 Thread Göran Broström

Den 2023-04-24 kl. 15:14, skrev Ivan Krylov:

В Mon, 24 Apr 2023 15:07:50 +0200
Göran Broström  пишет:


First, you should only ever write a method if you own the generic
or the class.



I was stunned when I read it. I write methods all over the place for
generics like print, summary, plot, etc.


You most likely do that for your own classes. It's sufficient to own
either the generic or the class, though owning both wouldn't hurt.


Thanks, I see. I define a class when I write a method, so no problem. 
What Hadley meant was that I shouldn't rewrite print.lm, for instance.


Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating methods

2023-04-24 Thread Göran Broström
I am reading Hadley's "Advanced R", Section 13.4.3 
(https://adv-r.hadley.nz/s3.html). It starts



There are two wrinkles to be aware of when you create a new method:

First, you should only ever write a method if you own the generic 
or the class. R will allow you to define a method even if you don’t, but 
it is exceedingly bad manners. Instead, work with the author of either 
the generic or the class to add the method in their code.



I was stunned when I read it. I write methods all over the place for 
generics like print, summary, plot, etc.


So my question is: Do I "own" those generics, or am I just showing bad 
manners?


Thanks, Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] web address for R-project

2023-03-09 Thread Göran Broström

Thanks, Göran

Den 2023-03-09 kl. 15:01, skrev Ivan Krylov:

В Thu, 9 Mar 2023 14:43:49 +0100
Göran Broström  пишет:


Is it someone's mirror (authorized) of R-project?


It's not currently in the official list of mirrors [*], but anyone can
create a CRAN mirror for their own needs as long as they don't overload
the CRAN master site: https://cran.r-project.org/mirror-howto.html

NexR call themselves a "total data service company" and have published
an R interface for Apache Hive [**], so it shouldn't be too surprising
that they have a local CRAN mirror.

The home of R project is at https://r-project.org/. If you don't want
to trust a J. Random Hacker from the Internet to give you the right
address, run citation() in an R prompt.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] web address for R-project

2023-03-09 Thread Göran Broström
I got the link cran.nexr.com from a person who claimed that it is the 
home of R-project, and indeed, it looks right.


But it doesn't feel right, so I wonder what is going on. Is it someone's 
mirror (authorized) of R-project?


Thanks, Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integer division

2022-12-19 Thread Göran Broström

Thanks Richard,

the "rounding claim" was my mistake (as I replied to Martin), I should 
said "truncates toward zero" as you explain.


However, my point was that these two mathematical functions should be 
defined in the documentation, as you also say. And I was surprised that 
there is no consensus regarding the definition of such elementary functions.


Göran

On 2022-12-20 03:01, Richard O'Keefe wrote:

The Fortran '08 standard says <<
One operand of type integer may be divided by another operand of type 
integer. Although the mathematical
quotient of two integers is not necessarily an integer, Table 7.2 
specifies that an expression involving the division
operator with two operands of type integer is interpreted as an 
expression of type integer. The result of such an
operation is the integer closest to the mathematical quotient and 
between zero and the mathematical quotient

inclusively. >>
Another way to say this is that integer division in
Fortran TRUNCATES towards zero.  It does not round and
never has.

C carefully left the behaviour of integer division (/)
unspecified, but introduced the div(,) function with the
same effect as Fortran (/).  Later versions of the C
standard tightened this up, and the C17 standard reads <<
The result of the / operator is the quotient from the division of the 
first operand by the second; the
result of the % operator is the remainder. In both operations, if the 
value of the second operand is

zero, the behavior is undefined.
When integers are divided, the result of the / operator is the algebraic 
quotient with any fractional
part discarded. 107) If the quotient a/b is representable, the 
expression (a/b)*b + a%b shall equal a ;

otherwise, the behavior of both a/b and a%b is undefined.>>

That is, C17 TRUNCATES the result of division towards
zero.  I don't know of any C compiler that rounds,
certainly gcc does not.


The Java 15 Language Specification says
<< Integer division rounds toward 0. >>
which also specified truncating division.


The help for ?"%/%" does not say what the result is.
Or if it does, I can't find it.  Either way, this is
a defect in the documentation.  It needs to be spelled
out very clearly.
R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
 > c(-8,8) %/% 3
[1] -3  2
so we deduce that R *neither* rounds *not* truncates,
but returns the floor of the quotient.
It is widely argued that flooring division is more
generally useful than rounding or truncating division,
but it is admittedly surprising.

On Tue, 20 Dec 2022 at 02:51, Göran Broström <mailto:g...@ehar.se>> wrote:


I have a long vector x with five-digit codes where the first digit of
each is of special interest, so I extracted them through

  > y <- x %/% 1

but to my surprise y contained the value -1 in some places. It turned
out that x contains -1 as a symbol for 'missing value' so in effect I
found that

  > -1 %/% 1 == -1

Had to check the help page for "%/%", and the first relevant comment I
found was:

"Users are sometimes surprised by the value returned".

No surprise there. Further down:

‘%%’ indicates ‘x mod y’ (“x modulo y”) and ‘%/%’ indicates
       integer division.  It is guaranteed that

       ‘ x == (x %% y) + y * (x %/% y) ’ (up to rounding error)

I did expect  (a %/% b) to return round(a / b), like gfortran and gcc,
but instead I get floor(a / b) in R. What is the reason for these
different definitions? And shouldn't R's definition be documented?

Thanks, Göran

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integer division

2022-12-19 Thread Göran Broström




Den 2022-12-19 kl. 15:41, skrev Martin Maechler:

Göran Broström
 on Mon, 19 Dec 2022 14:22:00 +0100 writes:


 > I have a long vector x with five-digit codes where the
 > first digit of each is of special interest, so I extracted
 > them through

 >> y <- x %/% 1

 > but to my surprise y contained the value -1 in some
 > places. It turned out that x contains -1 as a symbol for
 > 'missing value' so in effect I found that

 >> -1 %/% 1 == -1

 > Had to check the help page for "%/%", and the first
 > relevant comment I found was:

 > "Users are sometimes surprised by the value returned".

 > No surprise there. Further down:

 > ‘%%’ indicates ‘x mod y’ (“x modulo y”) and ‘%/%’
 > indicates integer division.  It is guaranteed that

 >   ‘ x == (x %% y) + y * (x %/% y) ’ (up to rounding
 > error)

 > I did expect (a %/% b) to return round(a / b), like
 > gfortran and gcc,

What???  I cannot believe you.


Well, you shouldn't, I generalized too far.


No time for checking now, but I bet that
8 / 3  gives 2 and not 3  in C and Fortran
(and hence gcc, etc)


But compare -8 %/% 3 in R and -8 / 3 in C/Fortran.

G,




 > but instead I get floor(a / b) in
 > R. What is the reason for these different definitions? And
 > shouldn't R's definition be documented?



 > Thanks, Göran

 > __
 > R-help@r-project.org mailing list -- To UNSUBSCRIBE and
 > more, see https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide
 > http://www.R-project.org/posting-guide.html and provide
 > commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Integer division

2022-12-19 Thread Göran Broström
I have a long vector x with five-digit codes where the first digit of 
each is of special interest, so I extracted them through


> y <- x %/% 1

but to my surprise y contained the value -1 in some places. It turned 
out that x contains -1 as a symbol for 'missing value' so in effect I 
found that


> -1 %/% 1 == -1

Had to check the help page for "%/%", and the first relevant comment I 
found was:


"Users are sometimes surprised by the value returned".

No surprise there. Further down:

‘%%’ indicates ‘x mod y’ (“x modulo y”) and ‘%/%’ indicates
 integer division.  It is guaranteed that

 ‘ x == (x %% y) + y * (x %/% y) ’ (up to rounding error)

I did expect  (a %/% b) to return round(a / b), like gfortran and gcc, 
but instead I get floor(a / b) in R. What is the reason for these 
different definitions? And shouldn't R's definition be documented?


Thanks, Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] is.na()<- on a character vector

2022-12-17 Thread Göran Broström

Thanks to all. I was cofused and forgot that in is.na(x) <- value,
value is an index vector. Clearly stated on the help page. So Bert's 
suggestion is the right one.



Den 2022-12-16 kl. 19:44, skrev Jeff Newmiller:

I don't find _either_ of these acceptable.

On the other hand,

 x[ is.na( x ) ] <- 1

should have no effect on x.

On December 16, 2022 10:28:52 AM PST, "Göran Broström"  
wrote:

I'm confused:


x <- 1:2
is.na(x) <- 1
x

[1] NA  2

OK, but


x <- c("A", "B")
is.na(x) <- "A"
x

   A
"A" "B"  NA

What happens?

G_ran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] is.na()<- on a character vector

2022-12-16 Thread Göran Broström
Yes, I was confused: In is.na(x) <-value, value is supposed to be an index 
vector, clearly stated on the help page. So Bert�s suggestion is the way to go.

Thanks to all, G�ran

-Ursprungligt meddelande-
Fr�n: Bill Dunlap 
Datum: fredag, 16 december 2022 20:15
Till: G�ran Brostr�m 
Kopia: r-help@r-project.org 
�mne: Re: [R] is.na()<- on a character vector
I think that is.na (x) <- i
generally does the same to x as does
x[i] <- NA


I say 'generally' because some classes (e.g., numeric_version) do not allow 
x[i]<-NA but do allow is.na (x)<-i. It is 
possible that some classes mess up this equivalence, but I think that would be 
considered a bug.


-Bill


On Fri, Dec 16, 2022 at 10:29 AM G�ran Brostr�m 
mailto:goran.brost...@umu.se> 
>> wrote:

I'm confused:

> x <- 1:2
> is.na (x) <- 1
> x
[1] NA 2

OK, but

> x <- c("A", "B")
> is.na (x) <- "A"
> x
A
"A" "B" NA

What happens?

G_ran

__
R-help@r-project.org <_blank> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help 

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 

and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] is.na()<- on a character vector

2022-12-16 Thread Göran Broström
I'm confused:

> x <- 1:2
> is.na(x) <- 1
> x
[1] NA  2

OK, but

> x <- c("A", "B")
> is.na(x) <- "A"
> x
   A 
"A" "B"  NA  

What happens?

G_ran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zero weights in weighted.mean

2021-07-14 Thread Göran Broström




On 2021-07-14 19:43, Sorkin, John wrote:

Gentlemen,

At the risk of beating a dead horse, but in he spirit of learning
more about R, aren't the two expressions functionally the same? One
drops values where weight is zero. The other (in the case where we
and infinity * 0, something one would not expect to see in data) also
drops data as in R infinity * 0 = Nan. In either case the observation
would be dropped. I am certain I am missing something, but I don't
know what I am missing.


Try this:

> my <- function(x, w) sum(x * w) / sum(w)

> x <- c(1, 1 / 0)

> w <- 1:0
> my(x, w)

[1] NaN

> weighted.mean(x, w)

[1] 1


See?

Best, Göran



John

John David Sorkin M.D., Ph.D. Professor of Medicine University of
Maryland School of Medicine Associate Director for Biostatistics and
Informatics Baltimore VA Medical Center Geriatrics, Research,
Education, and Clinical Center Chief, Biostatistics and Informatics 
University of Maryland School of Medicine Division of Gerontology,

Geriatrics and Palliative Care Senior Statistician University of
Maryland Center for Vascular Research 10 North Greene Street GRECC
(BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax)
410-605-7913 (Please call phone number above prior to faxing)



 From: R-help
 on behalf of Göran Broström
 Sent: Wednesday, July 14, 2021 10:46 AM To:
Duncan Murdoch; r-help@r-project.org Subject: Re: [R] zero weights in
weighted.mean



Den 2021-07-14 kl. 13:16, skrev Duncan Murdoch:

On 14/07/2021 6:00 a.m., Göran Broström wrote:
I wonder about the last sentence in the Details section of the 
documentation of 'weighted.mean':


"However, zero weights _are_ handled specially and the
corresponding ‘x’ values are omitted from the sum."

The return value of weighted.mean.default is

sum((x * w)[w != 0])/sum(w)

and indeed, it looks as if zero weights are getting special
treatment, but what is wrong with the alternative (equivalent?)
expression

sum(x * w) / sum(w)?

Is it a good idea to remove zeros from a vector before applying
'sum' to it? I don't think so. Anyway, the sentence in the
documentation seems to be uncalled for.


Inf*0 is not zero.  Setting weights to zero on infinite
observations (or NA, or NaN) will give different results in your
two expressions.


Thanks, agreed.

G,

__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=04%7C01%7CJSorkin%40som.umaryland.edu%7C3a3546f3bb4541fdc30808d946d6482b%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637618709012695753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=OE%2B97ntdhOx2x19hUx0wUFg9d%2BhMrsN8v5G%2BFHv69tA%3Dreserved=0



PLEASE do read the posting guide 
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=04%7C01%7CJSorkin%40som.umaryland.edu%7C3a3546f3bb4541fdc30808d946d6482b%7C717009a620de461a88940312a395cac9%7C0%7C0%7C637618709012695753%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=A0GsBl75Pq3MpWmncmtBz31z%2FJybPNWKWx8sgCbhKJ4%3Dreserved=0

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zero weights in weighted.mean

2021-07-14 Thread Göran Broström




Den 2021-07-14 kl. 13:16, skrev Duncan Murdoch:

On 14/07/2021 6:00 a.m., Göran Broström wrote:

I wonder about the last sentence in the Details section of the
documentation of 'weighted.mean':

"However, zero weights _are_ handled specially and the corresponding ‘x’
values are omitted from the sum."

The return value of weighted.mean.default is

sum((x * w)[w != 0])/sum(w)

and indeed, it looks as if zero weights are getting special treatment,
but what is wrong with the alternative (equivalent?) expression

sum(x * w) / sum(w)?

Is it a good idea to remove zeros from a vector before applying 'sum' to
it? I don't think so. Anyway, the sentence in the documentation seems to
be uncalled for.


Inf*0 is not zero.  Setting weights to zero on infinite observations (or 
NA, or NaN) will give different results in your two expressions.


Thanks, agreed.

G,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zero weights in weighted.mean

2021-07-14 Thread Göran Broström
I wonder about the last sentence in the Details section of the 
documentation of 'weighted.mean':


"However, zero weights _are_ handled specially and the corresponding ‘x’ 
values are omitted from the sum."


The return value of weighted.mean.default is

sum((x * w)[w != 0])/sum(w)

and indeed, it looks as if zero weights are getting special treatment, 
but what is wrong with the alternative (equivalent?) expression


sum(x * w) / sum(w)?

Is it a good idea to remove zeros from a vector before applying 'sum' to 
it? I don't think so. Anyway, the sentence in the documentation seems to 
be uncalled for.


G,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] density with weights missing values

2021-07-13 Thread Göran Broström




Den 2021-07-12 kl. 15:09, skrev Matthias Gondan:

Weighted mean behaves differently:


One difference is that density has a named argument 'weights' not 
present in weighted.mean, which instead has 'w' for weights.

Annoying.

So, in your examples, the argument 'weights = ' is always ignored, at 
least for weighted.mean.default:


> stats:::weighted.mean.default
function (x, w, ..., na.rm = FALSE)
{
if (missing(w)) {
if (na.rm)
x <- x[!is.na(x)]
return(sum(x)/length(x))
}
if (length(w) != length(x))
stop("'x' and 'w' must have the same length")
if (na.rm) {
i <- !is.na(x)
w <- w[i]
x <- x[i]
}
sum((x * w)[w != 0])/sum(w)
}

But, using 'w' for weights, missing values in weights will work only if 
na.rm = TRUE and they match missing values in x. As documented.


[...]

• no warning for sum(weights) != 1


and no warning for sum(w) != 1

That's because the weights w are normalized (after removing weights 
corresponding to missing values in x).


G,



weighted.mean(c(1, 2, 3, 4), weights=c(1, 1, 1, 1))

[1] 2.5

weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1))

[1] NA

weighted.mean(c(1, 2, 3, NA), weights=c(1, 1, 1, 1), na.rm=TRUE)

[1] 2




Von: Richard O'Keefe
Gesendet: Montag, 12. Juli 2021 13:18
An: Matthias Gondan
Betreff: Re: [R] density with weights missing values

Does your copy of R say that the weights must add up to 1?
?density doesn't say that in mine.   But it does check.

On Mon, 12 Jul 2021 at 22:42, Matthias Gondan  wrote:


Dear R users,

This works as expected:

• plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE))

This raises an error

• plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, 1)))
• plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1, NA)))

This seems to work (it triggers a warning that the weights don’t add up to 1, 
which makes sense*):

• plot(density(c(1,2, 3, 4, 5, NA), na.rm=TRUE, weights=c(1, 1, 1, 1, 1)))

Questions

• But shouldn’t the na.rm filter also filter the corresponding weights?
• Extra question: In case the na.rm filter is changed to filter the weights, 
the check for sum(weights) == 1 might trigger false positive warnings since the 
weights might not add up to 1 anymore

Best wishes,

Matthias


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting adjusted KM curve

2021-04-09 Thread Göran Broström




On 2021-04-05 03:34, Sorkin, John wrote:

Colleagues,
I am using the coxph to model survival time. How do I plot an adjusted Kaplan 
Meir plot resulting from coxph? The code I would like to run would start with:

# run cox model
fit1Cox <- coxph(surv_object ~age+sex,data=mydata)

I have no idea what would follow.


You should look at

?survfit.coxph

in the survival package, especially the 'newdata' argument.


I would like to plot adjusted KM curves for men vs. women at age 65.


Then I guess that you should stratify on sex:

fit <- coxph(surv_object ~ age + strata(sex), data = mydata)

sfit <- survfit(fit, newdata = data.frame(age = 65))
plot(sfit)

HTW, Göran



Thank you,
John


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Contrasts in coxph

2021-04-06 Thread Göran Broström

Dear John,

I googled "contrasts Cox regression" and found

https://stats.stackexchange.com/questions/15603/performing-contrasts-among-treatment-levels-in-survival-analysis

where Frank Harrell reports how to do what you want with his rms 
package. At least I think so, haven't tried it myself.


Best, Göran

On 2021-04-06 05:28, Sorkin, John wrote:

I would like to define contrasts on the output of a coxph function.
It appears that the contrast function from the contrast library does
not have a method defined that will allow computation of contrasts on
a coxph object.

How does one define and evaluate contrasts for a cox model?

Thank you, John

John David Sorkin M.D., Ph.D. Professor of Medicine Chief,
Biostatistics and Informatics University of Maryland School of
Medicine Division of Gerontology and Geriatric Medicine Baltimore VA
Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD
21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone
number above prior to faxing)


__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the

posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival analysis

2020-04-21 Thread Göran Broström

Dear dr Medic,

Den 2020-04-17 kl. 23:03, skrev Medic:

On 2020-04-17 20:06, Medic wrote:

I can't understand how to do a survival analysis (?Surv ()) when some
event occurred before the start of observation (left censored). If I
understand correctly, there are two methods. I chose a method with: 1)
time from the start of treatment to the event and 2) the indicator of
the event. I did (in my data) the event indicator so:
1 - event, 2 - event before the start of observation, 0 - no event


I have no experience of left censoring beyond the text book.  Is your
left censored data the SAME event or a different event?

YES, THE SAME!


---
library(survival)
left_censor_data <- read.table("left.csv", header = TRUE, sep = ";")
#sep = ";" it's right!
dput(left_censor_data, file="left_censor_data") #file attached
left_censor_data
'data.frame':   11 obs. of  2 variables:
$ timee : int  5 151 33 37 75 14 7 9 1 45 ...
$ eventt: int  2 0 0 0 0 0 0 2 0 1 ...
# 1—event, 2 – event before the start of observation , 0 – no event


So if I read this data correctly the first observation is left censored.
What does the time "5" refer to?  Is that 5 days BEFORE observation the
event happened?

YES, EXACTLY!

My text book understanding of left censored data was that your
censored points would
have time 0.

I TRIED TO SET TIME 0 NOW (for censored points), AND RECEIVED THE SAME
WARNING (AND THE CURVE TURNED OUT WRONG)


sur <- Surv(time = left_censor_data$timee,  event =
left_censor_data$eventt, type = "left")
   WARNING message:
   In Surv(time = left_censor_data$timee, event =
left_censor_data$eventt,  :
   Invalid status value, converted to NA

#Why such a WARNING message?


Because the choice "type = 'left'" is incompatible with an event 
variable taking three values, see the help page for survival::Surv. My 
guess from your (vague) description is that you want "type = 
'counting'", as indicated below.



#Then everything turns out wrong


Is the censoring type you want LEFT TRUNCATION rather than LEFT.
If they are also right censored I think R Surv calls these Counting.

I SAY ABOUT LEFT CENSORING (NOT ABOUT LEFT TRUNCATION)!
(COUNTING? I DO NOT UNDERSTAND THIS.)


Then you should read an elementary text on survival analysis or consult 
a local statistician.


G,



THANKS! I HOPE SOMEONE EXPLAIN TO ME
1) HOW TO COMPILE THE DATA and
2) WRITE A CODE

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calling a LAPACK subroutine from R

2019-09-11 Thread Göran Broström

Giovanni,

you are trying to open a can of worms. Se recent discussions on 
R-pkg-devel, Writing R Extensions, and the help page for .Fortran.


Best, Göran

On 2019-09-10 23:44, Giovanni Petris wrote:


Hello R-helpers!

I am trying to call a LAPACK subroutine directly from my R code using 
.Fortran(), but R cannot find the symbol name. How can I register/load the 
appropriate library?


### AR(1) Precision matrix
n <- 4L
phi <- 0.64
AB <- matrix(0, 2, n)
AB[1, ] <- c(1, rep(1 + phi^2, n-2), 1)
AB[2, -n] <- -phi
round(AB, 3)

   [,1]  [,2]  [,3] [,4]
[1,]  1.00  1.41  1.411
[2,] -0.64 -0.64 -0.640


### Cholesky factor
AB.ch <- .Fortran("dpbtrf", UPLO = 'L', N = as.integer(n),

+  KD = 1L, AB = AB, LDAB = 2L, INFO = as.integer(0))$AB
Error in .Fortran("dpbtrf", UPLO = "L", N = as.integer(n), KD = 1L, AB = AB,  :
   Fortran symbol name "dpbtrf" not in load table

sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin18.5.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.6_1/lib/libopenblasp-r0.3.6.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.6.0 tools_3.6.0

Thank you in advance for your help!

Best,
Giovanni Petris



--
Giovanni Petris, PhD
Professor
Director of Statistics
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] results of a survival analysis change when converting the data to counting process format

2019-08-23 Thread Göran Broström




Den 2019-08-22 kl. 21:48, skrev Göran Broström:



On 2019-08-18 19:10, Ferenci Tamas wrote:

Dear All,

Consider the following simple example:

library( survival )
data( veteran )

coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
  trt    prior    karno
  0.180197194 -0.005550919 -0.033771018

Note that we have neither time-dependent covariates, nor time-varying
coefficients, so the results should be the same if we change to
counting process format, no matter where we cut the times.

That's true if we cut at event times:

veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,
    data = veteran, cut = unique( veteran$time ) )

coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = 
veteran2 ) )

  trt    prior    karno
  0.180197194 -0.005550919 -0.033771018

But quite interestingly not true, if we cut at every day:

veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
    data = veteran, cut = 1:max(veteran$time) )

coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = 
veteran3 ) )

  trt    prior    karno
  0.180197215 -0.005550913 -0.033771016

The difference is not large, but definitely more than just a rounding
error, or something like that.

What's going on? How can the results get wrong, especially by
including more cutpoints?


All results are wrong, but they are useful (paraphrasing George EP Box).


That said, it is a little surprising: The generated risk sets are 
(should be) identical in all cases, and one would expect rounding errors 
to be the same. But data get stored differently, and ... who knows?


I tried your examples on my computer and got exactly the same results as 
you. Which surprised me.


G,



Göran



Thank you in advance,
Tamas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] results of a survival analysis change when converting the data to counting process format

2019-08-22 Thread Göran Broström




On 2019-08-18 19:10, Ferenci Tamas wrote:

Dear All,

Consider the following simple example:

library( survival )
data( veteran )

coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
  trtpriorkarno
  0.180197194 -0.005550919 -0.033771018

Note that we have neither time-dependent covariates, nor time-varying
coefficients, so the results should be the same if we change to
counting process format, no matter where we cut the times.

That's true if we cut at event times:

veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,
data = veteran, cut = unique( veteran$time ) )

coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) )
  trtpriorkarno
  0.180197194 -0.005550919 -0.033771018

But quite interestingly not true, if we cut at every day:

veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
data = veteran, cut = 1:max(veteran$time) )

coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) )
  trtpriorkarno
  0.180197215 -0.005550913 -0.033771016

The difference is not large, but definitely more than just a rounding
error, or something like that.

What's going on? How can the results get wrong, especially by
including more cutpoints?


All results are wrong, but they are useful (paraphrasing George EP Box).

Göran



Thank you in advance,
Tamas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Object and file sizes

2019-06-28 Thread Göran Broström




On 2019-06-28 15:26, Duncan Murdoch wrote:

On 28/06/2019 7:35 a.m., Göran Broström wrote:

Hello,

I have two large data frames, 'liss' (170 million obs, 8 variables) and
'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking
their sizes I get

  > object.size(liss)
7477492552 bytes
  > object.size(fobb)
2494591736 bytes

Fair enough, but when I save them to disk (saveRDS), the size relation
is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB!

I was puzzled by this and thought that I had made a mistake in creating
them, but the only explanation I can find for this is that 'liss'
contains a lot more missing values.


saveRDS() uses compression by default.  Compression works best if there 
are a lot of repetitive values; every NA is the same, so that would help 
  compression.  Other values may also be repeated.


If you use saveRDS(compress=FALSE), you'll get much larger results, 
probably roughly proportional to the object.size() results.


Almost equal to the object.size results: The differences are 2167 bytes 
and 2171 bytes, respectively (smaller on disk). Thanks for the explanation!


Göran



Duncan Murdoch


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Object and file sizes

2019-06-28 Thread Göran Broström

Hello,

I have two large data frames, 'liss' (170 million obs, 8 variables) and 
'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking 
their sizes I get


> object.size(liss)
7477492552 bytes
> object.size(fobb)
2494591736 bytes

Fair enough, but when I save them to disk (saveRDS), the size relation 
is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB!


I was puzzled by this and thought that I had made a mistake in creating 
them, but the only explanation I can find for this is that 'liss' 
contains a lot more missing values.


Suggestions?

Thanks, Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saving an R script

2019-05-10 Thread Göran Broström

?dump

Den 2019-05-10 kl. 04:54, skrev David Winters:

Greetings,

This is a super embarrassing question. But, how do you save an R
script with code? that is, not doing it the cheat way? where you just
hit the "save" button? what function would it be?

I know how to save the environment in code:

save.image(file='myEnvironment.RData')
quit(save='no')
load('myEnvironment.RData')

But how would I do the same thing with a script? rather than the environment?

David

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select

2019-02-11 Thread Göran Broström




On 2019-02-11 23:35, Val wrote:

Hi all,

I have a data frame  with tow variables  group and its size.
mydat<- read.table( text='group  count
G1 25
G2 15
G3 12
G4 31
G5 10' , header = TRUE, as.is = TRUE )



How about

x <- sample(1:5)

total <- mydat$count[x[1]]
i <- 1
while (total < 40){
i <- i + 1
total <- total + mydat$count[x[i]]
}

print(mydat$group[x[1:i]])

Göran



I want to select   group ID randomly (without replacement)  until  the
sum of count reaches 40.
So, in  the first case, the data frame could be
G4 31
65 10

In other case, it could be
   G5 10
   G2 15
   G3 12

How do I put sum of count variable   is  a minimum of 40 restriction?

Than k you in advance






I want to select group  ids randomly until I reach the

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on Binom.Confint

2018-09-14 Thread Göran Broström




On 2018-09-14 17:04, Guo, Fang (Associate) wrote:

I did use library(binom). However, I was able to use the method "lrt" which is 
short for likelihood ratio test.


You are right, there is a method "lrt", but it is not mentioned in the 
documentation. (Look at the code.)


Göran


-Original Message-
From: Jim Lemon [mailto:drjimle...@gmail.com]
Sent: Thursday, September 13, 2018 11:50 PM
To: Guo, Fang (Associate) ; r-help mailing list 

Subject: Re: [R] Question on Binom.Confint

Hi Fang,
Let's assume that you are using the "binom.confint" function in the "binom" package and 
you have made a spelling mistake or two. This function employs nine methods for estimating the binomial 
confidence interval. Sadly, none of these is "lrt". The zero condition is discussed in the help 
page for four of these methods. Assuming you want to use another method, you will have to look up the method. 
A good start is:

https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

Jim

On Fri, Sep 14, 2018 at 11:38 AM Guo, Fang (Associate)  
wrote:


Hi,

I have a question with the function Binom.Confint(x,n,"method"=lrt). For 
likelihood ratio test, I'd like to ask how you define the upper limit when the frequency 
of successes is zero. Thanks!


Fang Guo
Associate

CORNERSTONE RESEARCH
699 Boylston Street, 5th Floor
Boston, MA 02116-2836
617.927.3042 direct
fa...@cornerstone.com

www.cornerstone.com


***
Warning: This email may contain confidential or privileged information
intended only for the use of the individual or entity to whom it is
addressed. If you are not the intended recipient, please understand
that any disclosure, copying, distribution, or use of the contents of
this email is strictly prohibited.
***

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

***
Warning: This email may contain confidential or privileged information
intended only for the use of the individual or entity to whom it is
addressed. If you are not the intended recipient, please understand
that any disclosure, copying, distribution, or use of the contents
of this email is strictly prohibited.
***
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mysterious seg fault.

2018-08-11 Thread Göran Broström

Rolf,

have you tried to run R with a debugger, as in

$ R -d gdb

the gnu debugger?

G,

On 2018-08-11 23:12, Rolf Turner wrote:


I am getting a seg fault from a package that I am working on, and I am 
totally flummoxed by it.  The fault presumably arises from dynamically

loaded Fortran code, but I'm damned if I can see where the error lies.

In an effort to diagnose the problem I created a "non-package" version 
of the code.  That is, I copied all the *.R files and *.f file into a

new directory.  In that directory I created a *.so file using
R CMD SHLIB.

In the R code I removed all the "PACKAGE=" lines from the calls to
.Fortran() and put in appropriate dyn.load() calls.

I then started R in this new "clean" directory and sourced all of the
*.R files.

I then issued the command that produces the seg fault when run under the 
aegis of the package.  The command ran without a murmur of complaint.

WTF?

Can anyone suggest a reason why a seg fault might arise when the code is 
run in the context of a package, but not when it is run in "standalone 
mode"?


I have checked and rechecked my init.c file --- which is the only thing 
that I can think of that might create a difference --- and cannot find 
any discrepancy between the declarations in the init.c file and the 
Fortran code.


The package is a bit complicated, so giving more detail would be 
cumbersome.  Also I have no idea what aspects of detail would be 
relevant.  If anyone would like more info, feel free to ask.


I would really appreciate it if someone could give me some suggestions
before I go *completely* mad!

cheers,

Rolf Turner



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generate N random numbers with a given probability and condition

2018-07-12 Thread Göran Broström




On 2018-07-05 00:21, Nelly Reduan wrote:

Dear all,

I would like to generate N random numbers with a given probability and 
condition but I'm not sure how to do this.
For example, I have N = 20 and the vector from which to choose is seq(0, 10, 
1). I have tested:

x <- sample(seq(0, 10, 1), 20, replace=TRUE, prob=rep(0.28, times=length(seq(0, 
10, 1

But I don�t know how to put the condition sum(x) <= max(seq(0, 10, 1)).
Many thanks for your time
Nell


Maybe the paper "Acceptance–Rejection Sampling from the Conditional 
Distribution of Independent Discrete Random Variables, given their Sum", 
Statistics 34, pages 247-257, by Leif Nilsson and myself (2000) is relevant?


Göran




[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generate random Bernoulli draws

2018-07-06 Thread Göran Broström




On 2018-07-06 19:18, Berry, Charles wrote:

A liitle math goes along way. See below.


On Jul 5, 2018, at 10:35 PM, Marino David
 wrote:

Dear Bert,

I know it is a simple question. But for me, at current, I fail to
implement it. So, I ask for help here.

It is not homework.

Best,

David

2018-07-06 13:32 GMT+08:00 Bert Gunter :


Is this homework?

(There is an informal no-homework policy on this list).

Cheers, Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming
along and sticking things into it." -- Opus (aka Berkeley
Breathed in his "Bloom County" comic strip )

On Thu, Jul 5, 2018 at 10:23 PM, Marino David
 wrote:


Dear All,

I would like to generate N random Bernoulli draws given a
probability function F(x)=1-exp(-2.5*x) in which x follows
uniform distribution, say x~U(0,2).


If each Bernoulli draw is based on its own draw of x, then

rbinom( N, 1, 0.8013476 )

is what you want.


Maybe it is what he wants, but note that F(x) as defined is a random 
variable, not a (cumulative) probability function.


G,



It is left as an exercise for the reader to verify that the constant
0.8013476 is correct up to approximation error, and to prove that
such a Bernoulli mixture is also Bernoulli. Perhaps,

?integrate

will help.

But if the x's are shared you need to use runif, expm1, and
(possibly) rep to produce a vector to be used in place of  the prob
argument.

HTH,

Chuck




Can some one leave me some code lines for implementing this?

Thanks in advance.

David

[[alternative HTML version deleted]]

__ 
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,

see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read
the posting guide http://www.R-project.org/posti ng-guide.html 
and provide commented, minimal, self-contained, reproducible

code.






[[alternative HTML version deleted]]



__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the

posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question with integrate function

2018-02-06 Thread Göran Broström

Hi Hanna,

your function is essentially zero outside a short interval around 9. And 
the help page states: "If the function is approximately constant (in 
particular, zero) over nearly all its range it is possible that the 
result and error estimate may be seriously wrong."


You could try to integrate over a finite interval, say (7, 12).

Göran Broström

On 2018-02-06 19:40, li li wrote:

Sorry. I meant in the previous email that the function h() is a monotone
decreasing function. Thanks very much.

2018-02-06 13:32 GMT-05:00 li li <hannah@gmail.com>:


Hi all,
   The function h below is a function of c and it should be a monotone
increasing function since the integrand is nonnegative and integral is
taken from c to infinity. However, as we can see from the plot, it is not
shown to be monotone. Something wrong with the usage of integrate function?
Thanks so much for your help.
 Hanna



h <- function(c){
 g <- function(x){pnorm(x-8.8, mean=0.4, sd=0.3,
lower.tail=TRUE)*dnorm(x, mean=9,sd=0.15)}
 integrate(g, lower=c, upper=Inf)$value}

xx <- seq(-20,20,by=0.001)
y <- xx
for (i in 1:length(xx)){y[i] <- h(xx[i])}
plot(xx, y)




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R 3.4.3 is released

2017-12-02 Thread Göran Broström

Removing libopenblas-dev and libopenblas-base solved my problem.

Göran

On 2017-12-01 18:51, Göran Broström wrote:

Thanks: I installed from source and got an error when loading a package:


 > library(eha)
Loading required package: survival
Error: package or namespace load failed for ‘eha’ in dyn.load(file, 
DLLpath = DLLpath, ...):
  unable to load shared object 
'/usr/local/lib/R/site-library/eha/libs/eha.so':

   /usr/lib/x86_64-linux-gnu/libblas.so.3: undefined symbol: sgemv_thread_n


Never seen this one before...

I'm on Ubuntu artful, and upgraded with 'apt'. Then


goran@M6800:~/src/R-3.4.3$ /usr/bin/R
/usr/lib/R/bin/exec/R: symbol lookup error: 
/usr/lib/x86_64-linux-gnu/libblas.so.3: undefined symbol: sgemv_thread_n



What can I do?

Göran Broström

On 2017-11-30 10:56, Peter Dalgaard wrote:
The build system rolled up R-3.4.3.tar.gz (codename "Kite-Eating 
Tree") this morning.


The list below details the changes in this release.

You can get the source code from

http://cran.r-project.org/src/base/R-3/R-3.4.3.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


For the R Core Team,

Peter Dalgaard



These are the checksums (md5 and SHA-256) for the freshly created 
files, in case you wish

to check that they are uncorrupted:

MD5 (AUTHORS) = f12a9c3881197b20b08dd3d1f9d005e6
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = 32a94aba902b293cf8b8dbbf4113f2ab
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = cd01b91d7a7acd614ad72bd571bf26b3
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801
MD5 (NEWS.2) = 71562183d75dd2080d86c42bbf733bb7
MD5 (R-latest.tar.gz) = bc55db54f992fda9049201ca62d2a584
MD5 (README) = f468f281c919665e276a1b691decbbe6
MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435
MD5 (THANKS) = f60d286bb7294cef00cb0eed4052a66f
MD5 (VERSION-INFO.dcf) = 2f9cc25704594a615a8c8248d0dcd804
MD5 (R-3/R-3.4.3.tar.gz) = bc55db54f992fda9049201ca62d2a584

6474d9791fff6a74936296bde3fcb569477f5958e4326189bd6e5ab878e0cd4f  AUTHORS
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  
COPYING.LIB

7936facb07e752869342808b9c8879d0e270b1a9ec92b67ef4dd87496abfef0a  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
97a082deb0a67a341f2c88a881c56a409a81c1af697732c3b15d8226d3970231  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01  NEWS.1
a10f84be31f897456a31d31690df2fdc3f21a197f28b4d04332cc85005dcd0d2  NEWS.2
7a3cb831de5b4151e1f890113ed207527b7d4b16df9ec6b35e0964170007f426  
R-latest.tar.gz

2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc  README
408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9  
RESOURCES

52f934a4e8581945cbc1ba234932749066b5744cbd3b1cb467ba6ef164975163  THANKS
598f9c6b562c7106f741b9cdc2cc7d0bae364645f103e6ecb49e57625e28308b  
VERSION-INFO.dcf
7a3cb831de5b4151e1f890113ed207527b7d4b16df9ec6b35e0964170007f426  
R-3/R-3.4.3.tar.gz


This is the relevant part of the NEWS file

CHANGES IN R 3.4.3:

   INSTALLATION on a UNIX-ALIKE:

 * A workaround has been added for the changes in location of
   time-zone files in macOS 10.13 'High Sierra' and again in
   10.13.1, so the default time zone is deduced correctly from the
   system setting when R is configured with --with-internal-tzcode
   (the default on macOS).

 * R CMD javareconf has been updated to recognize the use of a Java
   9 SDK on macOS.

   BUG FIXES:

 * raw(0) & raw(0) and raw(0) | raw(0) again return raw(0) (rather
   than logical(0)).

 * intToUtf8() converts integers corresponding to surrogate code
   points to NA rather than invalid UTF-8, as well as values larger
   than the current Unicode maximum of 0x10.  (This aligns with
   the current RFC3629.)

 * Fix calling of methods on S4 generics that dispatch on ... when
   the call contains 

 * Following Unicode 'Corrigendum 9', the UTF-8 representations of
   U+FFFE and U+ are now regarded as valid by utf8ToInt().

 * range(c(TRUE, NA), finite = TRUE) and similar no longer return
   NA. (Reported by Lukas Stadler.)

 * The self starting function attr(SSlogis, "initial") now also
   works when the y values have exact minimum zero and is slightly
   changed in general, behaving symmetrically in the y range.

 * The printing of named raw vect

Re: [R] R 3.4.3 is released

2017-12-01 Thread Göran Broström

Thanks: I installed from source and got an error when loading a package:


> library(eha)
Loading required package: survival
Error: package or namespace load failed for ‘eha’ in dyn.load(file, 
DLLpath = DLLpath, ...):
 unable to load shared object 
'/usr/local/lib/R/site-library/eha/libs/eha.so':

  /usr/lib/x86_64-linux-gnu/libblas.so.3: undefined symbol: sgemv_thread_n


Never seen this one before...

I'm on Ubuntu artful, and upgraded with 'apt'. Then


goran@M6800:~/src/R-3.4.3$ /usr/bin/R
/usr/lib/R/bin/exec/R: symbol lookup error: 
/usr/lib/x86_64-linux-gnu/libblas.so.3: undefined symbol: sgemv_thread_n



What can I do?

Göran Broström

On 2017-11-30 10:56, Peter Dalgaard wrote:

The build system rolled up R-3.4.3.tar.gz (codename "Kite-Eating Tree") this 
morning.

The list below details the changes in this release.

You can get the source code from

http://cran.r-project.org/src/base/R-3/R-3.4.3.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


For the R Core Team,

Peter Dalgaard



These are the checksums (md5 and SHA-256) for the freshly created files, in 
case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = f12a9c3881197b20b08dd3d1f9d005e6
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = 32a94aba902b293cf8b8dbbf4113f2ab
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = cd01b91d7a7acd614ad72bd571bf26b3
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801
MD5 (NEWS.2) = 71562183d75dd2080d86c42bbf733bb7
MD5 (R-latest.tar.gz) = bc55db54f992fda9049201ca62d2a584
MD5 (README) = f468f281c919665e276a1b691decbbe6
MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435
MD5 (THANKS) = f60d286bb7294cef00cb0eed4052a66f
MD5 (VERSION-INFO.dcf) = 2f9cc25704594a615a8c8248d0dcd804
MD5 (R-3/R-3.4.3.tar.gz) = bc55db54f992fda9049201ca62d2a584

6474d9791fff6a74936296bde3fcb569477f5958e4326189bd6e5ab878e0cd4f  AUTHORS
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  COPYING.LIB
7936facb07e752869342808b9c8879d0e270b1a9ec92b67ef4dd87496abfef0a  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
97a082deb0a67a341f2c88a881c56a409a81c1af697732c3b15d8226d3970231  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01  NEWS.1
a10f84be31f897456a31d31690df2fdc3f21a197f28b4d04332cc85005dcd0d2  NEWS.2
7a3cb831de5b4151e1f890113ed207527b7d4b16df9ec6b35e0964170007f426  
R-latest.tar.gz
2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc  README
408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9  RESOURCES
52f934a4e8581945cbc1ba234932749066b5744cbd3b1cb467ba6ef164975163  THANKS
598f9c6b562c7106f741b9cdc2cc7d0bae364645f103e6ecb49e57625e28308b  
VERSION-INFO.dcf
7a3cb831de5b4151e1f890113ed207527b7d4b16df9ec6b35e0964170007f426  
R-3/R-3.4.3.tar.gz

This is the relevant part of the NEWS file

CHANGES IN R 3.4.3:

   INSTALLATION on a UNIX-ALIKE:

 * A workaround has been added for the changes in location of
   time-zone files in macOS 10.13 'High Sierra' and again in
   10.13.1, so the default time zone is deduced correctly from the
   system setting when R is configured with --with-internal-tzcode
   (the default on macOS).

 * R CMD javareconf has been updated to recognize the use of a Java
   9 SDK on macOS.

   BUG FIXES:

 * raw(0) & raw(0) and raw(0) | raw(0) again return raw(0) (rather
   than logical(0)).

 * intToUtf8() converts integers corresponding to surrogate code
   points to NA rather than invalid UTF-8, as well as values larger
   than the current Unicode maximum of 0x10.  (This aligns with
   the current RFC3629.)

 * Fix calling of methods on S4 generics that dispatch on ... when
   the call contains 

 * Following Unicode 'Corrigendum 9', the UTF-8 representations of
   U+FFFE and U+ are now regarded as valid by utf8ToInt().

 * range(c(TRUE, NA), finite = TRUE) and similar no longer return
   NA. (Reported by Lukas Stadler.)

 * The self starting function attr(SSlogis, "initial") now also
   works when the y values have exact minimum zero and is slightly
   changed in general, behaving symmetrically in the y range.

 * The printing of named raw vectors is now formatted nicely as for
   other such atomic vectors, thanks to Lukas Stadler.



__
R

Re: [R] Question about correlation

2017-07-06 Thread Göran Broström
Please keep the conversation on the list: Others may be able to help you 
better than I can.


On 2017-07-06 10:38, SEB140004 Student wrote:

Ya. I had successfully got the result. Thank you very much. :)

Isn't possible for me to obtain the network from the correlation matrix?


I know nothing about correlation networks, but by googling I found two R 
packages, ggraph and corrr, that may be of interest to you.


Göran

On Thu, Jul 6, 2017 at 1:38 AM, Göran Broström <goran.brost...@umu.se 
<mailto:goran.brost...@umu.se>> wrote:


On 2017-07-05 11:56, Jim Lemon wrote:

Hi Chin Yi,
If you are trying to correlate "Health" with "Disease", i.e.

cydf<-read.table(text="OTU ID Health Disease
Bacterial 1 0.29 0.34
Bacterial 2 0.25 0.07
Bacterial 3 0.06 0.06
Bacterial 4 0.07 0.09
Bacterial 5 0.02 0.05",
header=TRUE)
print(cor(cydf$Health,cydf$Disease))
[1] 0.7103517

If you are getting that error, it probably means that either
"Health"
or "Disease" or perhaps both have been read in as a factor. To test
this:

is.factor(cydf$Health)
[1] FALSE

is.factor(cydf$Disease)

[1] FALSE

If either of these returns TRUE, that is almost certainly the
problem.


Or maybe Chin Yi tried (as it seems)

 > cor(cydf)
Error in cor(cydf) : 'x' must be numeric

(with cydf == data): 'OTU' is not numeric.

Follow Jim's advice.

Göran



Jim


On Wed, Jul 5, 2017 at 11:27 AM, SEB140004 Student
<chi...@siswa.um.edu.my <mailto:chi...@siswa.um.edu.my>> wrote:

Greeting.

Dear Mr/Mrs/Miss,

OTU ID Health Disease
Bacterial 1 0.29 0.34
Bacterial 2 0.25 0.07
Bacterial 3 0.06 0.06
Bacterial 4 0.07 0.09
Bacterial 5 0.02 0.05
Above show the first 6 data sets, may I ask that the reason
of R show the
error like "Error in cor(data) : 'x' must be numeric" ? And
how to solve
it? Besides, isn't this data can conduct correlation matrix?

Moreover, isn't this data sets can be plot into network? If
can, which
package should I use?

Thank you.

Best regards,
Kang Chin Yi

  [[alternative HTML version deleted]]

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible
code.


__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about correlation

2017-07-05 Thread Göran Broström

On 2017-07-05 11:56, Jim Lemon wrote:

Hi Chin Yi,
If you are trying to correlate "Health" with "Disease", i.e.

cydf<-read.table(text="OTU ID Health Disease
   Bacterial 1 0.29 0.34
   Bacterial 2 0.25 0.07
   Bacterial 3 0.06 0.06
   Bacterial 4 0.07 0.09
   Bacterial 5 0.02 0.05",
   header=TRUE)
print(cor(cydf$Health,cydf$Disease))
[1] 0.7103517

If you are getting that error, it probably means that either "Health"
or "Disease" or perhaps both have been read in as a factor. To test
this:

is.factor(cydf$Health)
[1] FALSE

is.factor(cydf$Disease)

[1] FALSE

If either of these returns TRUE, that is almost certainly the problem.


Or maybe Chin Yi tried (as it seems)

> cor(cydf)
Error in cor(cydf) : 'x' must be numeric

(with cydf == data): 'OTU' is not numeric.

Follow Jim's advice.

Göran



Jim


On Wed, Jul 5, 2017 at 11:27 AM, SEB140004 Student
 wrote:

Greeting.

Dear Mr/Mrs/Miss,

OTU ID Health Disease
Bacterial 1 0.29 0.34
Bacterial 2 0.25 0.07
Bacterial 3 0.06 0.06
Bacterial 4 0.07 0.09
Bacterial 5 0.02 0.05
Above show the first 6 data sets, may I ask that the reason of R show the
error like "Error in cor(data) : 'x' must be numeric" ? And how to solve
it? Besides, isn't this data can conduct correlation matrix?

Moreover, isn't this data sets can be plot into network? If can, which
package should I use?

Thank you.

Best regards,
Kang Chin Yi

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fill in empty cell in data.frame from previous value

2017-06-25 Thread Göran Broström

> library(tidyr)
> ?fill

Göran

On 2017-06-24 19:49, Christophe Elek wrote:

Hello Total newbie here... I hope I read the guide properly

I have the following data.frame (I read it from a CSV file I cannot change)

   names val
1 Mandy   1
2 2
3 John2
4 2

I want to read the row number 2, but I want the first column to be “Mandy” and 
not null

print (frame[2,])
2 Mandy   2

I can manipulate the data.frame once loaded
How can I fill all cell in column “names” with the previous value ?
Or is there a function that will get me the row and fill the “names” column ?

NOTA BENE: I do not want the answer, I want to find it myself but I need 
guidance
If there is a function, tell me the library and I will search
If this is an algorithm, tell me generally how you would do and let me scratch 
my head first 

Thanks Chris


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Re: How create columns for squared values from previous columns?

2017-04-29 Thread Göran Broström



On 2017-04-29 06:45, Mike C wrote:

Thanks Rolf. I was just a bit frustrated that R wouldn't generate
dummy variable names on the fly.

Also, another question, if I want to put column 5 at column 3,

dat[, 3:5] <- dat[, c(5,3,4)]

It does not work, why?


It "works", but you need to shuffle the names in the same way:

names(dat)[3:5] <- names(dat)[c(5,3,4)]

Better(?):

perm <- c(1,2,5,3,4)
dat <- dat[perm]

dat is a list.

Göran



 From: Rolf Turner
 Sent: Friday, April 28, 2017 10:48:42 PM
To: C W Cc: r-help Subject: Re: [FORGED] Re: [R] How create columns
for squared values from previous columns?

On 29/04/17 13:21, C W wrote:

I came up with this solution,


cbind(dat, dat[, 1:3]^2)

X1 X2 X3 X4  X5  X1 X2
X3 1  0.72776481 -1.1332612 -1.9857503 0.46189400 -0.09016379
0.529641625 1.28428102 3.9432044 2  0.05126592  0.2858707
0.9075806 1.27582713 -0.49438507 0.002628194 0.08172203 0.8237026 3
-0.40430146  0.5457195 -1.1924042 0.15025594  1.99710475
0.163459669 0.29780978 1.4218277 4  1.40746971 -1.2279416
0.3296075 0.84411774 -0.52371619 1.980970990 1.50784058 0.1086411 5
-0.53841150  0.4750082 -0.4705148 0.05591914 -0.31503500
0.289886944 0.22563275 0.2213842 6  0.90691210  0.7247171
0.8244184 0.73328097 -1.05284737 0.822489552 0.52521494 0.6796657

But, you would NOT ONLY get undesired variable names, BUT ALSO
duplicated names. I suppose I can use paste() to solve that?

Any better ideas?


Well, if the names bizzo is your only worry, you could hit the
result with data.frame() *after* cbinding on the squared terms:

dat <- matrix(rnorm(30),ncol=5) dat <- cbind(dat,dat[,1:3]^2) dat <-
data.frame(dat) names(dat)

And as you indicate, the names of a data frame are easily adjusted.

I wouldn't lose sleep over it.

cheers,

Rolf Turner

P.S. You could also do

names(dat) <- make.unique(names(dat))

to your original idea, to get rid of the lack of uniqueness.  The
result is probably "undesirable" but.

R. T.

-- Technical Editor ANZJS Department of Statistics University of
Auckland Phone: +64-9-373-7599 ext. 88276

[[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-3.4.0 and survival_2.41-3 ..

2017-04-25 Thread Göran Broström



On 2017-04-25 10:34, Martin Maechler wrote:

Göran Broström <goran.brost...@umu.se>
on Tue, 25 Apr 2017 10:22:48 +0200 writes:


> I installed R-3.4.0 and got problems with the survival package, for 
instance
> 
>> library(survival)
>> mort <- data.frame(exit = 1:4, event = rep(1, 4), x = c(0, 1, 0, 1))
>> fit <- coxph(Surv(exit, event) ~ x, data = mort)
> Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :
> object 'Ccoxmart' not found
> -

> No problems with R-3.3.3 and the same (latest) survival version, which
> makes me think that something is going on in my R installation rather
> than in the survival package.

> Thanks for any hint,

> Göran

>> sessionInfo()
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS

> Matrix products: default
> BLAS: /usr/lib/openblas-base/libblas.so.3
> LAPACK: /usr/lib/libopenblasp-r0.2.18.so

> locale:
> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=sv_SE.UTF-8LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=sv_SE.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> other attached packages:
> [1] survival_2.41-3

> loaded via a namespace (and not attached):
> [1] compiler_3.4.0  Matrix_1.2-8splines_3.4.0   grid_3.4.0
> [5] lattice_0.20-35

I'm 99.5% sure that you are using more than just the default
library of package (a very good thing - we have been doing the
same for years).

We have in NEWS for R 3.4.0

  > PACKAGE INSTALLATION:

  >   [...]

  >   [...]

  >   • Packages which register native routines for .C or .Fortran need
  > to be re-installed for this version (unless installed with
  > R-devel SVN revision r72375 or later).

and Prof Brian Ripley did announce that nicely and early on R-devel.
==> https://hypatia.math.ethz.ch/pipermail/r-devel/2017-March/073940.html

==> You have to re-install quite a few packages for R 3.4.0,
 __if__ they use .C() or .Fortran()

When we've e-talked about the issue within R-core,
Uwe Ligges noted we should really ask everyone to run

  update.packages(checkBuilt=TRUE)


A small nuisance with this is that I end up with two versions of the 
survival package (and other recommended packages), since I cannot write 
to /usr/lib/R/library. However, this is obviously a problem suitable to 
present on R-SIG-Debian. See you there.


Göran



after an update to a new major (meaning "R-x.y.0") release of R
and **not** re-use packages {inside R-x.y.z}
that were installed with R-x.(y-1).z'  ..

and of course Uwe is right:
We should ask others to do it _and_ do it ourselves.

Anyway it _is_ considerably more important for the 3.4.0
release.

Martin Maechler
ETH Zurich (and R Core team)




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-3.4.0 and survival_2.41-3 ..

2017-04-25 Thread Göran Broström
Right, normally this is how it works for me when I install R from 
source. In this case, on this computer, I use the debian/ubuntu 
packaging, and then it is necessary to 'rebuild' packages, obviously.


Thanks, Göran

On 2017-04-25 12:20, Viechtbauer Wolfgang (SP) wrote:

Sort of an obvious approach, but after every upgrade (regardless if
it is major/minor), I just delete my entire personal library and
reinstall everything from scratch. For this, I have a script that
includes just a bunch of install.packages() calls. Like:

install.packages(c("lme4", "glmmML", "MCMCglmm"))
install.packages(c("psych", "GPArotation", "sem", "lavaan")) [...]

I split things up a bit, based on the purpose/topic (along the lines
of http://www.wvbauer.com/doku.php/rpackages) to keep things
organized.

This may not be the most efficient method if you use hundreds of
packages, but works for me.

Best, Wolfgang



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-3.4.0 and survival_2.41-3 ..

2017-04-25 Thread Göran Broström

Thanks Martin,

that helped!

Göran

On 2017-04-25 10:34, Martin Maechler wrote:

Göran Broström <goran.brost...@umu.se>
on Tue, 25 Apr 2017 10:22:48 +0200 writes:


> I installed R-3.4.0 and got problems with the survival package, for 
instance
> 
>> library(survival)
>> mort <- data.frame(exit = 1:4, event = rep(1, 4), x = c(0, 1, 0, 1))
>> fit <- coxph(Surv(exit, event) ~ x, data = mort)
> Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :
> object 'Ccoxmart' not found
> -

> No problems with R-3.3.3 and the same (latest) survival version, which
> makes me think that something is going on in my R installation rather
> than in the survival package.

> Thanks for any hint,

> Göran

>> sessionInfo()
> R version 3.4.0 (2017-04-21)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.2 LTS

> Matrix products: default
> BLAS: /usr/lib/openblas-base/libblas.so.3
> LAPACK: /usr/lib/libopenblasp-r0.2.18.so

> locale:
> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=sv_SE.UTF-8LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=sv_SE.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> other attached packages:
> [1] survival_2.41-3

> loaded via a namespace (and not attached):
> [1] compiler_3.4.0  Matrix_1.2-8splines_3.4.0   grid_3.4.0
> [5] lattice_0.20-35

I'm 99.5% sure that you are using more than just the default
library of package (a very good thing - we have been doing the
same for years).

We have in NEWS for R 3.4.0

  > PACKAGE INSTALLATION:

  >   [...]

  >   [...]

  >   • Packages which register native routines for .C or .Fortran need
  > to be re-installed for this version (unless installed with
  > R-devel SVN revision r72375 or later).

and Prof Brian Ripley did announce that nicely and early on R-devel.
==> https://hypatia.math.ethz.ch/pipermail/r-devel/2017-March/073940.html

==> You have to re-install quite a few packages for R 3.4.0,
 __if__ they use .C() or .Fortran()

When we've e-talked about the issue within R-core,
Uwe Ligges noted we should really ask everyone to run

  update.packages(checkBuilt=TRUE)

after an update to a new major (meaning "R-x.y.0") release of R
and **not** re-use packages {inside R-x.y.z}
that were installed with R-x.(y-1).z'  ..

and of course Uwe is right:
We should ask others to do it _and_ do it ourselves.

Anyway it _is_ considerably more important for the 3.4.0
release.

Martin Maechler
ETH Zurich (and R Core team)




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R-3.4.0 and survival_2.41-3

2017-04-25 Thread Göran Broström

I installed R-3.4.0 and got problems with the survival package, for instance


> library(survival)
> mort <- data.frame(exit = 1:4, event = rep(1, 4), x = c(0, 1, 0, 1))
> fit <- coxph(Surv(exit, event) ~ x, data = mort)
Error in fitter(X, Y, strats, offset, init, control, weights = weights,  :
  object 'Ccoxmart' not found
-

No problems with R-3.3.3 and the same (latest) survival version, which 
makes me think that something is going on in my R installation rather 
than in the survival package.


Thanks for any hint,

Göran

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=sv_SE.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=sv_SE.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] survival_2.41-3

loaded via a namespace (and not attached):
[1] compiler_3.4.0  Matrix_1.2-8splines_3.4.0   grid_3.4.0
[5] lattice_0.20-35

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem installing R 2.5

2016-11-07 Thread Göran Broström

libX11-dev?

Göran

On 2016-11-07 19:27, Lentes, Bernd via R-help wrote:

Hi,

i'd like to install R 2.5 on an Ubuntu 14.04. I have a special
software requiring this old version.

While ./configure, i get the following error:

checking for mbstate_t... yes checking for X... no configure: error:
--with-x=yes (default) and X11 headers/libs are not available

The problem is that i don't know which headers or libraries are
missing, and when i search with aptitude in the Ubuntu repositories i
find a ton of them and don't know which one to install.

Can anyone help me ?

Thanks.


Bernd



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditional gap time frailty cox model for recurrent events

2016-09-05 Thread Göran Broström

Dear Elisabetta,

I have no direct answer to your question, but a suggestion: Use the 
'coxme' function (in the package with the same name). In the help page 
for 'frailty' (survival) you will find: "The coxme package has 
superseded this method. It is faster, more stable, and more flexible."


Hth, Göran

On 2016-09-05 11:42, Elisabetta Petracci wrote:

Dear users,

I am fitting a conditional gap time frailty cox model weighting
observations by means of inverse probability time dependent weigths.
Attached find the self-explaining dataset.

I have used the following sintax:

coxph(Surv(gaptstart,gaptstop,status)~treat+strata(nrecord01)+frailty(id,distribution="gamma",method="em"),
data=dataNOTDrr,weights=dataNOTDrr$weight)


And I get the following warning:

Warning message:
In coxpenal.fit(X, Y, strats, offset, init = init, control, weights =
weights,  :
  Inner loop failed to coverge for iterations 3 4


I have tried to:
- leave out the weights but I get the error anyway
- to randomly select a subset of patients and I don't get the error. This
seems to suggest that the problem is with some observations.

Any suggestion?

Many thanks,

Elisabetta
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can I increase the size of an asterisk in plotmath()?

2016-06-30 Thread Göran Broström

Rolf,

text(7.5,2.5,expression(paste(bolditalic(p),"*", sep = "")))

looks quite nice. On my Mac at least.

Göran

On 30/06/16 02:38, Rolf Turner wrote:


I am trying to plot an expression of the form "p^*" --- a bold letter p
with the asterisk as a superscript.

I can get *something* with code of the form

plot(1:10)
text(7.5,2.5,expression(paste(bolditalic(p)^"*")))

but the asterisk that appears is *tiny*.

Is there any way to increase its size?  (I expect not, but I just
thought I'd ask!)

cheers,

Rolf Turner



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Improper configuration

2016-04-29 Thread Göran Broström



On 2016-04-29 23:59, Bert Gunter wrote:

Something on your end. I clicked on your link and it took me to CRAN
with no problems.


Right, the 'problem' is gone now. I am on a fresh install of ubuntu 
16.04LTS with Firefox. Maybe some core-team-guy just fixed the 
certificate?;) Or the new install needs some warming-up.


Thanks Bert,

Göran



Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Apr 29, 2016 at 2:26 PM, Göran Broström <goran.brost...@umu.se> wrote:

Trying https://cran.r-project.org got me

"The owner of cran.r-project.org has configured their website improperly"

If I try http (instead of https) I'm redirected to

www.project.org ("Project America")

Any ideas?

Göran Broström

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Improper configuration

2016-04-29 Thread Göran Broström

Trying https://cran.r-project.org got me

"The owner of cran.r-project.org has configured their website improperly"

If I try http (instead of https) I'm redirected to

www.project.org ("Project America")

Any ideas?

Göran Broström

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convergence issues when using ns splines (pkg: spline) in Cox model (coxph) even when changing coxph.control

2016-03-30 Thread Göran Broström



On 2016-03-30 23:06, David Winsemius wrote:



On Mar 29, 2016, at 1:47 PM, Jennifer Wu, Miss
 wrote:

Hi,

I am currently using R v3.2.3 and on Windows 10 OS 64Bit.

I am having convergence issues when I use coxph with a interaction
term (glarg*bca_py) and interaction term with the restricted cubic
spline (glarg*bca_time_ns). I use survival and spline package to
create the Cox model and cubic splines respectively. Without the
interaction term and/or spline, I have no convergence problem. I
read some forums about changing the iterations and I have but it
did not work. I was just wondering if I am using the inter.max and
outer.max appropriately. I read the survival manual, other R-help
and stackoverflow pages and it suggested changing the iterations
but it doesn't specify what is the max I can go. I ran something
similar in SAS and did not run into a convergence problem.

This is my code:

bca_time_ns <- ns(ins_ca$bca_py, knots=3,
Boundary.knots=range(2,5,10)) test <- ins_ca$glarg*ins_ca$bca_py
test1 <- ins_ca$glarg*bca_time_ns


In your `coxph` call the variable 'bca_py' is the survival time and


Right David: I didn't notice that the 'missing main effect' in fact was 
part of the survival object! And as you say: Time to rethink the whole 
model.


Göran


yet here you are constructing not just one but two interactions (one
of which is a vector but the other one a matrix) between 'glarg' and
your survival times. Is this some sort of effort to identify a
violation of proportionality over the course of a study?

Broström sagely points out that these interactions are not in the
data-object and subsequent efforts to refer to them may be confounded
by the multiple environments from which data would be coming into the
model. Better to have everything come in from the data-object.

The fact that SAS did not have a problem with this rather
self-referential or circular model may be a poor reflection on SAS
rather than on the survival package. Unlike Therneau or Broström who
asked for data, I suggest the problem lies with the model
construction and you should be reading what Therneau has written
about identification of non-proportionality and identification of
time dependence of effects. See Chapter 6 of his "Modeling Survival
Data".



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Convergence issues when using ns splines (pkg: spline) in Cox model (coxph) even when changing coxph.control

2016-03-30 Thread Göran Broström

Hi Jennifer,

see below.

On 2016-03-29 22:47, Jennifer Wu, Miss wrote:

Hi,

I am currently using R v3.2.3 and on Windows 10 OS 64Bit.

I am having convergence issues when I use coxph with a interaction
term (glarg*bca_py) and interaction term with the restricted cubic
spline (glarg*bca_time_ns).


Comment on interactions: (i) You should not create interaction terms 
'manually' but rather in the formula in the call to coxph ('*' means 
different things in a formula and on the command line). That assures 
that you get the main effects included. In your formula, you are missing 
the main effects 'bca_py' and 'bca_time_ns'. Is that intentional? In a 
few exceptional cases it may make sense to drop a main affect, but 
generally not. (ii) Convergence problems may also occur with 
interactions of variables that are not centered, so try to center 
involved covariates.


Your 'coxph.control' values have nothing to do with your convergence 
problem.


And try to provide reproducible code, in your case, the data set. If it 
is not possible, maybe you can scale it down to include only the 
problematic variables (with some fake values, if necessary) and just a 
few cases, but enough to show your problem.


Göran Broström

> I use survival and spline package to

create the Cox model and cubic splines respectively. Without the
interaction term and/or spline, I have no convergence problem. I read
some forums about changing the iterations and I have but it did not
work. I was just wondering if I am using the inter.max and outer.max
appropriately. I read the survival manual, other R-help and
stackoverflow pages and it suggested changing the iterations but it
doesn't specify what is the max I can go. I ran something similar in
SAS and did not run into a convergence problem.

This is my code:

bca_time_ns <- ns(ins_ca$bca_py, knots=3,
Boundary.knots=range(2,5,10)) test <- ins_ca$glarg*ins_ca$bca_py
test1 <- ins_ca$glarg*bca_time_ns

coxit <- coxph.control(iter.max=1, outer.max=1)

bca<-coxph(Surv(bca_py,bca) ~ glarg + test + test1 + age + calyr +
diab_dur + hba1c + adm_met + adm_sulfo + adm_tzd + adm_oth +
med_statin + med_aspirin + med_nsaids + bmi_cat + ccscat + alc + smk,
data=ins_ca, control=coxit, ties=c("breslow"))


This is the error message I get:

Warning message: In fitter(X, Y, strats, offset, init, control,
weights = weights,  : Ran out of iterations and did not converge



[[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help for survival curves for the survfit object‏

2016-02-08 Thread Göran Broström



On 2016-02-08 14:11, 于慧琳 wrote:

Hi guy,I have some problems when plotting survival curves for the
survfit object.My survfit model is as
follows:fit6<-survfit(Surv(Survival_time_year_,event)~Race_1_Desc,data=data6)Race_1_Desc
is a indicator variable which has 0 and 1 values.However, in my
dataset, all of the Race_1_Desc are of value 0, which means I have
only 1 group.When I plot my survival curve use the following
command,plot(fit6,col=c("blue","red"),xlab="Survival
Time",ylab="Survival Probability",main="Survival Plot for Leaf 6")
It is correct that I only has one curve for Race_1_Desc coded as 0,
which is the blue full line. But there are also two dashed line, one
is red and one is blue.I just want to know what are the two dashed
lines mean and why the show up in the plot.


Maybe they are confidence limits? Read the help page for plot.survfit, 
especially 'conf.int'.


Göran

  [[alternative HTML

version deleted]]

__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dates and missing values

2016-02-08 Thread Göran Broström

I have a data frame with dates as integers:

> summary(persons[, c("foddat", "doddat")])
 foddat doddat
 Min.   :1679   Min.   :1800
 1st Qu.:18760904   1st Qu.:18810924
 Median :19030426   Median :19091227
 Mean   :18946659   Mean   :19027233
 3rd Qu.:19220911   3rd Qu.:19310526
 Max.   :19660124   Max.   :19691228
 NA's   :624NA's   :207570

After converting the dates to Date format ('as.Date') I get:

> summary(per[, c("foddat", "doddat")])
foddat   doddat
 Min.   :1679-07-01   Min.   :1800-01-26
 1st Qu.:1876-09-04   1st Qu.:1881-09-24
 Median :1903-04-26   Median :1909-12-27
 Mean   :1895-02-04   Mean   :1903-02-22
 3rd Qu.:1922-09-10   3rd Qu.:1931-05-26
 Max.   :1966-01-24   Max.   :1969-12-28

My question is: Why are the numbers of missing values not printed in the 
second case? 'is.na' gives the correct (same) numbers.


Can I somehow force 'summary' to print NA's? I found no clues in the 
documentation.


> sessionInfo()
R version 3.2.3 Patched (2016-01-19 r69960)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

Göran Broström

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dates and missing values

2016-02-08 Thread Göran Broström

Thanks Marc, but see below!

On 2016-02-08 19:26, Marc Schwartz wrote:



On Feb 8, 2016, at 11:26 AM, Göran Broström <goran.brost...@umu.se> wrote:

I have a data frame with dates as integers:


summary(persons[, c("foddat", "doddat")])

 foddat doddat
Min.   :1679   Min.   :1800
1st Qu.:18760904   1st Qu.:18810924
Median :19030426   Median :19091227
Mean   :18946659   Mean   :19027233
3rd Qu.:19220911   3rd Qu.:19310526
Max.   :19660124   Max.   :19691228
NA's   :624NA's   :207570

After converting the dates to Date format ('as.Date') I get:


summary(per[, c("foddat", "doddat")])

foddat   doddat
Min.   :1679-07-01   Min.   :1800-01-26
1st Qu.:1876-09-04   1st Qu.:1881-09-24
Median :1903-04-26   Median :1909-12-27
Mean   :1895-02-04   Mean   :1903-02-22
3rd Qu.:1922-09-10   3rd Qu.:1931-05-26
Max.   :1966-01-24   Max.   :1969-12-28

My question is: Why are the numbers of missing values not printed in the second 
case? 'is.na' gives the correct (same) numbers.

Can I somehow force 'summary' to print NA's? I found no clues in the 
documentation.



Hi,

Two things:

1. We are going to need to see the exact call to as.Date() that you used. as.Date() will 
take a numeric vector as input, but the presumption is that the number represents the 
number of days since an origin, which needs to be specified explicitly. If you coerced 
the numeric vector to character first, presuming a "%Y%m%d" format, then you 
need to be cautious about how that is done and the result.

2. Your second call is to a data frame called 'per', which may or may not have 
the same content as 'persons' in your first call.


If I do the following, taking some of your numeric values from above:

x <- c(1800, 18810924, 19091227, 19027233, 19310526, 19691228, NA)

DF <- data.frame(x)


summary(DF)

x
  Min.   :1800
  1st Qu.:18865001
  Median :19059230
  Mean   :18988523
  3rd Qu.:19255701
  Max.   :19691228
  NA's   :1


as.character(DF$x)

[1] "1.8e+07"  "18810924" "19091227" "19027233" "19310526" "19691228"
[7] NA

DF$x.Date <- as.Date(as.character(DF$x), format = "%Y%m%d")


DF

  x x.Date
1 1800   
2 18810924 1881-09-24
3 19091227 1909-12-27
4 19027233   
5 19310526 1931-05-26
6 19691228 1969-12-28
7   NA   


summary(DF)

xx.Date
  Min.   :1800   Min.   :1881-09-24
  1st Qu.:18865001   1st Qu.:1902-12-04
  Median :19059230   Median :1920-09-10
  Mean   :18988523   Mean   :1923-04-12
  3rd Qu.:19255701   3rd Qu.:1941-01-17
  Max.   :19691228   Max.   :1969-12-28
  NA's   :1  NA's   :3


But:

> summary(DF[, "x.Date", drop = FALSE])
 x.Date
 Min.   :1881-09-24
 1st Qu.:1902-12-04
 Median :1920-09-10
 Mean   :1923-04-12
 3rd Qu.:1941-01-17
 Max.   :1969-12-28

No NA's. But again:

> summary(DF[, "x.Date"])
Min.  1st Qu.   Median Mean  3rd Qu. 
  Max.
"1881-09-24" "1902-12-04" "1920-09-10" "1923-04-12" "1941-01-17" 
"1969-12-28"

NA's
 "3"



So summary does support the reporting of NA's for Dates, using summary.Date().


Not always, as it seems. Strange. (The 'persons' vs. 'per' is a red 
herring.)


Göran Broström



Regards,

Marc Schwartz



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot step function

2016-02-06 Thread Göran Broström

On 2016-02-06 19:23, David Winsemius wrote:



On Feb 6, 2016, at 4:11 AM, jupiter <jupiter@gmail.com> wrote:

Hi,

I am just starting to learn R, sorry for asking a simple question. How can
plot a line x <= 0 y = 0, x > 0 y = 1?


There is a stepfun function and an associated plotting method:

  y0 <- c(rep(0,3),rep(1,3))
  sfun0  <- stepfun(-2:2, y0, right=TRUE)
  plot(sfun0)


I would like to suggest the correct way:

Replace the last line by

> plot(sfun0, verticals = FALSE)

Göran Broström

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using survreg() in survival package with long data

2015-08-30 Thread Göran Broström

On 08/29/2015 11:56 PM, Fox, John wrote:

Dear Göran,

Thank you for responding to my query; please see below:


-Original Message- From: R-help
[mailto:r-help-boun...@r-project.org] On Behalf Of Göran Broström
Sent: August 29, 2015 3:59 PM To: r-help@r-project.org Subject: Re:
[R] using survreg() in survival package with long data

Dear John,

I think you are missing that 'survreg' does not handle left
truncated data. With that you should use the 'eha' package, for a
PH model the function 'phreg', and for an AFT model the function
'aftreg' (you didn't tell which model you want to fit).


That's odd, in that it's not true in general, since, e.g., survreg()
can be used to fit the left-censored Tobit regression model, as
illustrated in this example from ?survreg: tobinfit -
survreg(Surv(durable, durable0, type='left') ~ age + quant,
data=tobin, dist='gaussian')


Well, it is true that survreg doesn't handle left truncated data. Its 
ability to cope with left censored data does not change that.



In fact, in my example, the data are right-censored, but in the
second data set are represented in counting-process form as one-week
intervals. I imagine that you're right in the sense that survreg()
can't handle data like this in counting-process form.


Which implies left truncation, at least formally.


Yet, I've
reviewed the documentation in the survey package but can't find any
reference to this -- which is not to say that it's not there, only
that I don't see it.


Right, it is not explicitly spelled out in the documentation. But this 
has been discussed earlier on R-help. See e.g.

https://stat.ethz.ch/pipermail/r-help/2008-November/178621.html

Göran





Your attempt with the 'survreg' function implies that you are
satisfied with a Weibull baseline distribution, in which case you
could choose either model.


Right, I understand that this is the default. The problem is just a
small toy example meant to illustrate the error.

Thanks again, John



Göran


On 08/29/2015 07:06 PM, Fox, John wrote:

Dear list members,

I'm unable to fit a parametric survival regression using
survreg() in the survival package with data in counting-process
(long) form.

To illustrate using a scaled-down problem with 10 subjects (with
data placed on the web):

--- snip 

library(survival) RW -
read.table(http://socserv.mcmaster.ca/jfox/.Pickup/RW.txt;) RL
-
read.table(http://socserv.mcmaster.ca/jfox/.Pickup/RL.txt;)



RW # wide data

week arrest age 120  1  27 217  1  18 325
1 19 452  0  23 552  0  19 652  0  24 7
23 1  25 852  0  21 952  0  22 10   52  0
20


head(RL, 20) # long data, counting-process form

start stop arrest.time age 1.1  01   0  27 1.2
1 2   0  27 1.3  23   0  27 1.4  3
4 0  27 1.5  45   0  27 1.6  56
0 27 1.7  67   0  27 1.8  78   0
27 1.9  89   0  27 1.10 9   10   0
27 1.11 10   11   0  27 1.1211   12   0  27
1.1312 13   0  27 1.1413   14   0  27
1.1514   15 0  27 1.1615   16   0  27 1.1716
17   0 27 1.1817   18   0  27 1.1918   19
0  27 1.2019   20   1  27

--- snip 

I have no trouble fitting a Cox model to both the wide and long
forms of the data, obtaining (as should be the case) identical
results:

--- snip 


coxph(Surv(week, arrest) ~ age, data=RW)  # works

Call: coxph(formula = Surv(week, arrest) ~ age, data = RW)


coef exp(coef) se(coef)zp age 0.09631.1011   0.2073
0.46 0.64

Likelihood ratio test=0.21  on 1 df, p=0.643 n= 10, number of
events= 4

coxph(Surv(start, stop, arrest.time) ~ age,  data=RL) # works,
same

Call: coxph(formula = Surv(start, stop, arrest.time) ~ age, data
= RL)


coef exp(coef) se(coef)zp age 0.09631.1011   0.2073
0.46 0.64

Likelihood ratio test=0.21  on 1 df, p=0.643 n= 397, number of
events= 4

--- snip 

But when I try to fit a parametric survival regression with
survreg(), I get an error with the long form of the data:

--- snip 


survreg(Surv(week, arrest) ~ age, data=RW) # works

Call: survreg(formula = Surv(week, arrest) ~ age, data = RW)

Coefficients: (Intercept) age 6.35386771 -0.08982624

Scale= 0.7363196

Loglik(model)= -22.1   Loglik(intercept only)= -22.2 Chisq= 0.3
on 1 degrees of freedom, p= 0.58 n= 10


survreg(Surv(start, stop, arrest.time) ~ age,  data=RL) #
fails

Error in survreg(Surv(start, stop, arrest.time) ~ age, data = RL)
: Invalid survival type

--- snip 

I expect that there's something about survreg() that I'm missing.
I first noted this problem quite some time ago but didn't look
into it carefully because I didn't really need to use survreg().

Any help would be appreciated

Re: [R] using survreg() in survival package with long data

2015-08-29 Thread Göran Broström

Dear John,

I think you are missing that 'survreg' does not handle left truncated
data. With that you should use the 'eha' package, for a PH model the
function 'phreg', and for an AFT model the function 'aftreg' (you didn't
tell which model you want to fit).

Your attempt with the 'survreg' function implies that you are satisfied 
with a Weibull baseline distribution, in which case you could choose 
either model.


Göran


On 08/29/2015 07:06 PM, Fox, John wrote:

Dear list members,

I'm unable to fit a parametric survival regression using survreg() in
the survival package with data in counting-process (long) form.

To illustrate using a scaled-down problem with 10 subjects (with data
placed on the web):

--- snip 

library(survival) RW -
read.table(http://socserv.mcmaster.ca/jfox/.Pickup/RW.txt;) RL -
read.table(http://socserv.mcmaster.ca/jfox/.Pickup/RL.txt;)



RW # wide data

week arrest age 120  1  27 217  1  18 325  1
19 452  0  23 552  0  19 652  0  24 723
1  25 852  0  21 952  0  22 10   52  0  20


head(RL, 20) # long data, counting-process form

start stop arrest.time age 1.1  01   0  27 1.2  1
2   0  27 1.3  23   0  27 1.4  34
0  27 1.5  45   0  27 1.6  56   0
27 1.7  67   0  27 1.8  78   0  27
1.9  89   0  27 1.10 9   10   0  27 1.11
10   11   0  27 1.1211   12   0  27 1.1312
13   0  27 1.1413   14   0  27 1.1514   15
0  27 1.1615   16   0  27 1.1716   17   0
27 1.1817   18   0  27 1.1918   19   0  27
1.2019   20   1  27

--- snip 

I have no trouble fitting a Cox model to both the wide and long forms
of the data, obtaining (as should be the case) identical results:

--- snip 


coxph(Surv(week, arrest) ~ age, data=RW)  # works

Call: coxph(formula = Surv(week, arrest) ~ age, data = RW)


coef exp(coef) se(coef)zp age 0.09631.1011   0.2073 0.46
0.64

Likelihood ratio test=0.21  on 1 df, p=0.643 n= 10, number of events=
4

coxph(Surv(start, stop, arrest.time) ~ age,  data=RL) # works,
same

Call: coxph(formula = Surv(start, stop, arrest.time) ~ age, data =
RL)


coef exp(coef) se(coef)zp age 0.09631.1011   0.2073 0.46
0.64

Likelihood ratio test=0.21  on 1 df, p=0.643 n= 397, number of
events= 4

--- snip 

But when I try to fit a parametric survival regression with
survreg(), I get an error with the long form of the data:

--- snip 


survreg(Surv(week, arrest) ~ age, data=RW) # works

Call: survreg(formula = Surv(week, arrest) ~ age, data = RW)

Coefficients: (Intercept) age 6.35386771 -0.08982624

Scale= 0.7363196

Loglik(model)= -22.1   Loglik(intercept only)= -22.2 Chisq= 0.3 on 1
degrees of freedom, p= 0.58 n= 10


survreg(Surv(start, stop, arrest.time) ~ age,  data=RL) # fails

Error in survreg(Surv(start, stop, arrest.time) ~ age, data = RL) :
Invalid survival type

--- snip 

I expect that there's something about survreg() that I'm missing. I
first noted this problem quite some time ago but didn't look into it
carefully because I didn't really need to use survreg().

Any help would be appreciated.

Thanks, John - John Fox, Professor
McMaster University Hamilton, Ontario Canada L8S 4M4 Web:
socserv.mcmaster.ca/jfox

__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival Analysis and Predict time-to-death

2015-08-18 Thread Göran Broström



On 2015-08-18 01:44, David Winsemius wrote:


On Aug 17, 2015, at 1:51 PM, David Winsemius wrote:



On Aug 17, 2015, at 12:10 PM, survivalUser wrote:


Dear All,

I would like to build a model, based on survival analysis on some
data, that is able to predict the /*expected time until death*/
for a new data instance.


Are you sure you want to use life expectancy as the outcome? In
order to establish a mathematical expectation  you need to have
know the risk at all time in the future, which as pointed out in
the print.survfit help page is undefined unless the last
observation is a death. Very few datasets support such an estimate.
If on the other hand you have sufficient events in the future, then
you may be able to more readily justify an estimate of a median
survival.


Dear survivalUser;

I've been reminded that you later asked for a parametric model built
with survreg. The above commentary applies to the coxph models and
objects and not to survreg objects. If you do have a parametric
model, even with incomplete observation then calculating life
expectancy should be a simple matter of plugging the parameters for
the distribution's mean value, since life-expectancy is the
statistical mean. So maybe you do want such a modle. The default
survreg  distribution is weibull so just go to your mathematical
statistics text and look up the formula for the mean of a Weibull
distribution with the estimated parameters.

No need for 'the mathematical statistics text': The necessary 
information is found on the help page for the Weibull distribution: E(T) 
=  b Gamma(1 + 1/a), where 'b' is scale (really!) and 'a' is shape. You 
must however take into account the special parametrization that is used 
by 'survreg'; see its help page for how to do it.


Alternatively, use 'aftreg' in the package 'eha' and get the same 
parametrization as in base  R.


After getting the baseline expectation by the formula above, simply 
multiply that value by exp(-lp) to get the expected life for an 
individual with linear predictor lp.


A useful alternative is simulation (use 'rweibull') or numerical 
integration, especially for estimating remaining expected life 'later in 
life'. And for other distributions than the Weibull.



Göran Broström
(author of the eha package)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R wont accept my zero count values in the GLM with quasi_poisson dsitribution

2015-07-28 Thread Göran Broström



On 28/07/15 08:33, Charlotte wrote:

Hello

I have count values for abundance which follow a pattern of
over-dispersal with many zero values.  I have read a number of
documents which suggest that I don't use data transforming methods
but rather than I run the GLM with the quasi poisson distribution.
So I have written my script and R is telling me that Y should be more
than 0.


No,  R  is telling you that you must have 0 = y = 1 (see below).
For count data you should not use the binomial family, but rather 
'poisson', or 'quasipoisson'.


?family

Göran


Everything I read tells me to do it this way but I can't get R to
agree. Did I need to add something else to my script to get it to
work and keep my data untransformed? The script I wrote is as
follows:


fit - glm(abundance~Gender,data=teminfest,family=binomial())


then I get this error Error in eval(expr, envir, enclos) : y values
must be 0 = y = 1

I don't use R a lot so I am having trouble figuring out what to do
next.

I would appreciate some help

Many Thanks Charlotte






-- View this message in context:
http://r.789695.n4.nabble.com/R-wont-accept-my-zero-count-values-in-the-GLM-with-quasi-poisson-dsitribution-tp4710462.html



Sent from the R help mailing list archive at Nabble.com.


__ R-help@r-project.org
mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and provide
commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prmatrix and print.matrix

2015-06-26 Thread Göran Broström



On 06/25/2015 11:18 PM, Therneau, Terry M., Ph.D. wrote:

The help page for prmatrix states that it only exists for backwards 
compatability and
strongly hints at using print.matrix instead.
   However, there does not seem to be a print.matrix() function.'


I asked this very same question here 18 months ago, but got no real 
answer (except for a reference to 'write.matrix' (MASS) from David 
Winsemius). However, googling found the following:



Updating packages for 1.7.0:

print.matrix() only exists for backwards compatibility: it is the same
as print.default().  Very likely print.matrix was used for the
right=TRUE argument that S's print.default does not have, but this is
unnecessary.  If you want the call sequence of print.matrix, use 
prmatrix instead.  Also, note that print prints the attributes of the 
matrix whereas prmatrix does not.

--
http://developer.r-project.org/170update.txt

So, my guess is that print.matrix once existed but is now silently gone. 
Use print.default.


Göran



The help page for print mentions a zero.print option, but that does not appear 
to affect
matrices.  This (or something like it) is what I was looking for.

Am I overlooking something?

Terry Therneau

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace the value with 1 and 0

2015-02-26 Thread Göran Broström



On 2015-02-26 00:33, JS Huang wrote:

Hi,

   Here is an implementation.  More data are added.  An extra column hasRain
is added instead of replacing column Amount.


rain

Year Month Day Amount
1  1950 1   10.0
2  1950 1   2   35.5
3  1950 1   3   17.8
4  1950 1   4   24.5
5  1950 1   5   12.3
6  1950 1   6   11.5
7  1950 1   75.7
8  1950 1   8   13.2
9  1950 1   9   11.3
10 1950 1  10   14.7
11 1950 1  11   11.9
12 1950 1  12   17.5
13 1950 1  138.1
14 1950 1  140.4
15 1950 1  150.0
16 1950 1  16   19.5
17 1950 1  17   10.7
18 1950 1  180.5
19 1950 1  19   12.7
20 1950 1  206.3
21 1950 2   10.0
22 1950 2   2   35.5
23 1950 2   3   17.8
24 1950 2   4   24.5
25 1950 2   5   12.3
26 1950 2   6   11.5
27 1950 2   75.7
28 1950 2   8   13.2
29 1950 2   9   11.3
30 1950 2  10   14.7
31 1950 2  11   11.9
32 1950 2  12   17.5
33 1950 2  138.1
34 1950 2  140.4
35 1950 2  150.0
36 1950 2  16   19.5
37 1950 2  17   10.7
38 1950 2  180.0
39 1950 2  190.0
40 1950 2  200.0

rain$hasRain - ifelse(rain$Amount0,1,0)


No! Better is

rain$hasRain - as.numeric(rain$Amount  0)

See previous discussions about the use of 'ifelse'.

Göran


rain

Year Month Day Amount hasRain
1  1950 1   10.0   0
2  1950 1   2   35.5   1
3  1950 1   3   17.8   1
4  1950 1   4   24.5   1
5  1950 1   5   12.3   1
6  1950 1   6   11.5   1
7  1950 1   75.7   1
8  1950 1   8   13.2   1
9  1950 1   9   11.3   1
10 1950 1  10   14.7   1
11 1950 1  11   11.9   1
12 1950 1  12   17.5   1
13 1950 1  138.1   1
14 1950 1  140.4   1
15 1950 1  150.0   0
16 1950 1  16   19.5   1
17 1950 1  17   10.7   1
18 1950 1  180.5   1
19 1950 1  19   12.7   1
20 1950 1  206.3   1
21 1950 2   10.0   0
22 1950 2   2   35.5   1
23 1950 2   3   17.8   1
24 1950 2   4   24.5   1
25 1950 2   5   12.3   1
26 1950 2   6   11.5   1
27 1950 2   75.7   1
28 1950 2   8   13.2   1
29 1950 2   9   11.3   1
30 1950 2  10   14.7   1
31 1950 2  11   11.9   1
32 1950 2  12   17.5   1
33 1950 2  138.1   1
34 1950 2  140.4   1
35 1950 2  150.0   0
36 1950 2  16   19.5   1
37 1950 2  17   10.7   1
38 1950 2  180.0   0
39 1950 2  190.0   0
40 1950 2  200.0   0

tapply(rain$hasRain,list(rain$Year,rain$Month),sum)

   1  2
1950 18 15






--
View this message in context: 
http://r.789695.n4.nabble.com/Replace-the-value-with-1-and-0-tp4703838p4703840.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] corrupted double-linked list

2015-01-29 Thread Göran Broström

Hello,

A weird thing happened to me while I was playing around in the R console 
with a data frame ('fert'). The finale was


-
 with(fert[fert$parity == 4, ], table(age, event))
 event
age 0   1   2
  (14,20]   0   0   0
  (20,25]   6  43  41
  (25,30]  57 565 513
  (30,35] 121 719 686
  (35,40] 165 309 317
  (40,45] 156  45  29
  (45,50]  13   0   0
 fit1 - coxph(Surv(exit - enter, event  0.5) ~ strata(age) + civst + 
birthdate, data = fert[fert$parity == 0, ])
*** Error in `/usr/local/lib/R/bin/exec/R': corrupted double-linked 
list: 0x04f1aa40 ***

Aborted
--
There was no .Rhistory file left (and my memory is too short) so I 
cannot reproduce it, but what is a double-linked list?


This is on Ubuntu 14.10 with  R  built from source.

 sessionInfo()
R version 3.1.2 Patched (2014-12-08 r67137)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=sv_SE.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=sv_SE.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] eha_2.4-2   survival_2.37-7

loaded via a namespace (and not attached):
[1] tools_3.1.2

Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] corrupted double-linked list

2015-01-29 Thread Göran Broström

On 2015-01-29 12:42, Duncan Murdoch wrote:

On 29/01/2015 6:24 AM, Göran Broström wrote:

Hello,

A weird thing happened to me while I was playing around in the R console
with a data frame ('fert'). The finale was

-
   with(fert[fert$parity == 4, ], table(age, event))
   event
age 0   1   2
(14,20]   0   0   0
(20,25]   6  43  41
(25,30]  57 565 513
(30,35] 121 719 686
(35,40] 165 309 317
(40,45] 156  45  29
(45,50]  13   0   0
   fit1 - coxph(Surv(exit - enter, event  0.5) ~ strata(age) + civst +
birthdate, data = fert[fert$parity == 0, ])
*** Error in `/usr/local/lib/R/bin/exec/R': corrupted double-linked
list: 0x04f1aa40 ***
Aborted
--
There was no .Rhistory file left (and my memory is too short) so I
cannot reproduce it, but what is a double-linked list?


That message likely comes from the C runtime library glibc, not directly
from R.  It says that the memory being managed by C has been corrupted,
presumably by an out of bounds write sometime earlier.  (This might
conceivably happen if an object had been cleaned up by the R memory
manager then referred to later, but that usually gives a different error.)

If you can make it reproducible we could track it down, but it's much
harder without that.  You could try running your code under valgrind if
you know how to do that (or want to read the manual and learn).


I ran

$ R -d valgrind --leak-check=full --show-reachable=yes -v -f koll.R 2 ojd

on koll.R:

load(fert.rda)
library(eha)
fert$age - cut(fert$enter, c(15, 20, 25, 30, 35, 40, 45, 50))
fert$age - cut(fert$enter, c(14, 20, 25, 30, 35, 40, 45, 50))

fit - coxreg(Surv(exit - enter, event  0.5) ~ strata(age) + civst +
 birthdate, data = fert[fert$parity == 0, ])
fit1 - coxph(Surv(exit - enter, event  0.5) ~ strata(age) + civst +
 birthdate, data = fert[fert$parity == 0, ])
summary(fit1)

which produced an output of 22765 lines. A quick scan of them showed 
nothing in particular, except:


(lines 50 - 54):
--29802--   .. CRC mismatch (computed 1fb85af8 wanted 2e9e3c16)
--29802--object doesn't have a symbol table
==29802== WARNING: new redirection conflicts with existing -- ignoring it
--29802-- old: 0x04019ca0 (strlen  ) R- (.0) 
0x38068331 ???
--29802-- new: 0x04019ca0 (strlen  ) R- (2007.0) 
0x04c2e1a0 strlen


and, at the end:
==29802== LEAK SUMMARY:
==29802==definitely lost: 0 bytes in 0 blocks
==29802==indirectly lost: 0 bytes in 0 blocks
==29802==  possibly lost: 0 bytes in 0 blocks
==29802==still reachable: 39,088,272 bytes in 13,280 blocks
==29802== suppressed: 0 bytes in 0 blocks
==29802==
==29802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
==29802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

(should this go to R-devel? I could try harder to provoke the error.)

Göran


Duncan Murdoch



This is on Ubuntu 14.10 with  R  built from source.

   sessionInfo()
R version 3.1.2 Patched (2014-12-08 r67137)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=sv_SE.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=sv_SE.UTF-8LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=sv_SE.UTF-8   LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] eha_2.4-2   survival_2.37-7

loaded via a namespace (and not attached):
[1] tools_3.1.2

Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get latest dates for different people in a dataset

2015-01-25 Thread Göran Broström

On 2015-01-24 01:14, William Dunlap wrote:

Here is one way.  Sort the data.frame, first by Name then break ties with
CheckInDate.
Then choose the rows that are the last in a run of identical Name values.


I do it by sorting by the reverse order of CheckinDate (last date first) 
within Name, then


 dLatestVisit - dSorted[!duplicated(dSorted$Name), ]

I guess it is faster, but who knows?

Göran




txt - NameCheckInDate  Temp

+ John  1/3/2014  97
+ Mary 1/3/2014  98.1
+ Sam   1/4/2014  97.5
+ John  1/4/2014  99

d - read.table(header=TRUE,

colClasses=c(character,character,numeric), text=txt)

d$CheckInDate - as.Date(d$CheckInDate, as.Date, format=%d/%m/%Y)
isEndOfRun - function(x) c(x[-1] != x[-length(x)], TRUE)
dSorted - d[order(d$Name, d$CheckInDate), ]
dLatestVisit - dSorted[isEndOfRun(dSorted$Name), ]
dLatestVisit

   Name CheckInDate Temp
4 John  2014-04-01 99.0
2 Mary  2014-03-01 98.1
3  Sam  2014-04-01 97.5


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Jan 23, 2015 at 3:43 PM, Tan, Richard r...@panagora.com wrote:


Hi,

Can someone help for a R question?

I have a data set like:

NameCheckInDate  Temp
John  1/3/2014  97
Mary 1/3/2014  98.1
Sam   1/4/2014  97.5
John  1/4/2014  99

I'd like to return a dataset that for each Name, get the row that is the
latest CheckInDate for that person.  For the example above it would be

NameCheckInDate  Temp
John  1/4/2014  99
Mary 1/3/2014  98.1
Sam   1/4/2014  97.5


Thank you for your help!

Richard


 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] get latest dates for different people in a dataset

2015-01-25 Thread Göran Broström

See inline;

On 2015-01-25 20:27, William Dunlap wrote:

  dLatestVisit - dSorted[!duplicated(dSorted$__Name), ]
 
 I guess it is faster, but who knows?

You can find out by making a function that generates datasets of
various sizes and timing the suggested algorithms.  E.g.,
makeData -
function(nPatients, aveVisitsPerPatient, uniqueNameDate = TRUE){
 nrow - trunc(nPatients * aveVisitsPerPatient)
 patientNames - paste0(P,seq_len(nPatients))
 possibleDates - as.Date(16001:17000, origin=as.Date(1970-01-01))
 possibleTemps - seq(97, 103, by=0.1)
 data - data.frame(Name=sample(patientNames, replace=TRUE, size=nrow),
CheckInDate=sample(possibleDates, replace=TRUE, size=nrow),
Temp=sample(possibleTemps, replace=TRUE, size=nrow))
 if (uniqueNameDate) {
 data - data[!duplicated(data[, c(Name, CheckInDate)]), ]
 }
 data
}
funs - list(
 f1 = function(data) {
 do.call(rbind, lapply(split(data, data$Name), function(x)
x[order(x$CheckInDate),][nrow(x),]))
 }, f2 = function (d)
 {
 isEndOfRun - function(x) c(x[-1] != x[-length(x)], TRUE)
 dSorted - d[order(d$Name, d$CheckInDate), ]
 dSorted[isEndOfRun(dSorted$Name), ]
 }, f3 = function (d)
 {
 # is the following how you did reverse sort on date ( fwd on
name)?


Yes; in fact I do this all the time in my applications (survival 
analysis), where I have several records for each individual.


Göran


 #  Too bad that order's decreasing arg is not vectorized
 dSorted - d[order(d$Name, -as.numeric(d$CheckInDate)), ]
 dSorted[!duplicated(dSorted$Name), ]
 }, f4 = function(dta)
 {
 dta %% group_by(Name)  %% filter(CheckInDate==max(CheckInDate))
 })

D - makeData(nPatients=35000, aveVisitsPerPatient=3.7) # c. 129000 visits
library(dplyr)
Z - lapply(funs, function(fun){
 time - system.time( result - fun(D) ) ; list(time=time,
result=result) })

sapply(Z, function(x)x$time)
#   f1   f2   f3   f4
#user.self  461.25 0.47 0.36 3.01
#sys.self 1.20 0.00 0.00 0.01
#elapsed472.33 0.47 0.39 3.03
#user.child NA   NA   NA   NA
#sys.child  NA   NA   NA   NA

# duplicated is a bit better than diff, dplyr rather slower, rbind much
slower.

equivResults - function(a, b) {
# results have different classes and different orders, so only check
size and contents
 identical(dim(a),dim(b))  all(a[order(a$Name),]==b[order(b$Name),])
}
sapply(Z[-1], function(x)equivResults(x$result, Z[[1]]$result))
#  f2   f3   f4
#TRUE TRUE TRUE

Note that the various functions give different results if any patient comes
in twice on the same day.  f4 includes both visits in the ouput, the other
include either the first or last (as ordered in the original file).

Bill Dunlap
TIBCO Software
wdunlap tibco.com http://tibco.com

On Sun, Jan 25, 2015 at 1:01 AM, Göran Broström goran.brost...@umu.se
mailto:goran.brost...@umu.se wrote:

On 2015-01-24 01:14, William Dunlap wrote:

Here is one way.  Sort the data.frame, first by Name then break
ties with
CheckInDate.
Then choose the rows that are the last in a run of identical
Name values.


I do it by sorting by the reverse order of CheckinDate (last date
first) within Name, then

  dLatestVisit - dSorted[!duplicated(dSorted$__Name), ]

I guess it is faster, but who knows?

Göran


txt - NameCheckInDate  Temp

+ John  1/3/2014  97
+ Mary 1/3/2014  98.1
+ Sam   1/4/2014  97.5
+ John  1/4/2014  99

d - read.table(header=TRUE,

colClasses=c(character,__character,numeric), text=txt)

d$CheckInDate - as.Date(d$CheckInDate, as.Date,
format=%d/%m/%Y)
isEndOfRun - function(x) c(x[-1] != x[-length(x)], TRUE)
dSorted - d[order(d$Name, d$CheckInDate), ]
dLatestVisit - dSorted[isEndOfRun(dSorted$__Name), ]
dLatestVisit

Name CheckInDate Temp
4 John 2014-04-01 99 tel:2014-04-01%2099.0
2 Mary 2014-03-01 98 tel:2014-03-01%2098.1
3  Sam 2014-04-01 97 tel:2014-04-01%2097.5


Bill Dunlap
TIBCO Software
wdunlap tibco.com http://tibco.com


On Fri, Jan 23, 2015 at 3:43 PM, Tan, Richard r...@panagora.com
mailto:r...@panagora.com wrote:

Hi,

Can someone help for a R question?

I have a data set like:

NameCheckInDate  Temp
John  1/3/2014  97
Mary 1/3/2014  98.1
Sam   1/4/2014  97.5
John  1/4/2014  99

I'd like to return a dataset that for each Name, get the row
that is the
latest CheckInDate for that person.  For the example above

Re: [R] elegant way to remove cases with many different ids

2015-01-23 Thread Göran Broström



On 2015-01-23 11:09, Ivan Calandra wrote:

Hi Alain,

I think you're looking for %in% (see ?'%in%' for the help page)

id.vector - c(1,3)  ## here you define the values you want to select:
x, y, z...
subset(df, id %in% id.vector)



But note the Warning in

?subset

Göran



HTH,
Ivan

--
Ivan Calandra, ATER
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
https://www.researchgate.net/profile/Ivan_Calandra

Le 23/01/15 10:50, D. Alain a écrit :

Dear R-List,

I have nested data with cases grouped by IDs and I want to remove all cases 
with specific IDs.

something like

df-data.frame(id=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5),var=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15))

subset(df,id== 1 | id == 3)

Having a very large dataset with many IDs to remove this solution leaves me to write quite a long 
list of id == x | id == y | id == z  I wonder if there is a more elegant way to do 
this? I tried with id = c(x,y,z) but this did not give the proper result.

Any suggestions? Thank you!

Best wishes

Alain






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing date format

2014-10-07 Thread Göran Broström



On 2014-10-07 11:27, Jim Lemon wrote:

On Tue, 7 Oct 2014 10:32:42 AM Frederic Ntirenganya wrote:

Dear All,

How can I change the format of date of day of the year ? for

example r

(i.e. 17 Apr rather than 108).

The following is the type of the dataset I have

head(Samaru)
   Year Start End Length
1 1930   108 288180
2 1931   118 288170
3 1932   115 295180
4 1933   156 294138
5 1934   116 291175
6 1935   134 288154


Hi Frederic,
The easiest method I can think of is this:

Samaru$Start-format(as.Date(
  paste(Samaru$Year,01-01,sep=-))+Samaru$Start,
  %b %d)
Samaru$End-format(as.Date(
  paste(Samaru$Year,01-01,sep=-))+Samaru$End,
  %b %d)


In the package 'eha' I have a function 'toDate':

 require(eha)
 toDate(1930 + 108/365)
[1] 1930-04-19

(Interestingly, we are all wrong; the correct answer seems to be 
1930-04-18)


 toDate
function (times)
{
if (!is.numeric(times))
stop(Argument must be numeric)
times * 365.2425 + as.Date(-01-01)
}

The 'inverse' function is 'toTime'.

Sometimes it misses by one day; not very important in my applications, 
but may be otherwise.


Göran B.


Samaru

   Year  StartEnd Length
1 1930 Apr 19 Oct 16180
2 1931 Apr 29 Oct 16170
3 1932 Apr 25 Oct 22180
4 1933 Jun 06 Oct 22138
5 1934 Apr 27 Oct 19175
6 1935 May 15 Oct 16154

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predictions from coxph or cph objects

2014-07-08 Thread Göran Broström

On 2014-07-06 11:17, Göran Broström wrote:

On 2014-07-06 10:48, Göran Broström wrote:

David and Axel,

I have two comments to your discussion:

(i) The area under the survival curve is equal to the mean of the
distribution, so the estimate of the mean should be the sum of the areas
of the rectangles defined by the estimated survival curve and the
successive distances between observed event times.

Thus,

   surv - pred$surv
   time - pred$time
   sum(surv * diff(time))

should give you the (estimated) mean). (Note that time[1] == 0, and
length(time) == length(surv) + 1)


Well, this is not quite true; on the first interval the survival curve
is one, so you need to

   surv - c(1, surv)

first. But then the lengths of the surv and time vectors do not match so
you need to add a (large) time at the end of time. If the largest
observation is an event, 'no problem' (surv is zero), but otherwise ...

Btw, I tried

   exit - rexp(10)
   event - rep(1, 10)
   fit - coxph(Surv(exit, event) ~ 1)

   survfit(fit)$surv
   [1] 0.90483742 0.80968410 0.71454371 0.61942215 0.52432953 0.42928471
   [7] 0.33432727 0.23955596 0.14529803 0.05345216

   survfit(Surv(exit, event) ~ 1)$surv
[1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

so be careful ...


Addendum: Note the argument 'type':

 survfit(fit, type = kalbfleisch-prentice)$surv
 [1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

This gives, I think (at least if no ties), the nonparametric maximum 
likelihood estimator (NPMLE) of the baseline survivor function in the PH 
model, and thus coincides with the Kaplan-Meier estimator in the case of 
no covariates. It drops down to zero if the largest observation is a 
failure. See Kalbfleisch  Prentice (1980), pp. 84-86.


I suppose that the default type ( = efron) simply uses the formula 
S(t) = exp(-H(t)) (on an estimator of H(t)), which never drops down to 
zero, censorings or no censorings. This relation does not hold for 
discrete-time distributions, and even if our underlying model is 
continuous, the resulting non-parametric estimator of H(t) define a 
discrete-time distribution, which causes a small dilemma for the purist. 
(Should 'type = kalbfleisch-prentice' be the default in survfit?)


However, in practice this is of no importance at all. If you stick to 
the median and quantiles.


Göran



Göran



(I do not think that David's suggestion gives the same answer, but I may
be wrong.)

(ii) With censored data, this may be a bad idea. For instance, when the
largest observation is a censoring time, you may badly underestimate the
mean. Your best hope is to be able to estimate a conditional mean of the
type E(T | T  x).

This is essentially a non-parametric situation, and therefore it is
better to stick to medians and quantiles.

Göran Broström

On 2014-07-06 06:17, David Winsemius wrote:


On Jul 5, 2014, at 9:12 PM, David Winsemius wrote:



On Jul 5, 2014, at 12:43 PM, Axel Urbiz wrote:


Thank you David. It is my understanding that using survfirsurvit
below I get the median predicted survival. I actually was looking
for the mean. I can't seem to find in the documentation how to get
that.

options(na.action=na.exclude) # retain NA in predictions
fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
pred - survfit(fit, newdata=lung)
head(pred)


There might be a way. I don't know it if so, so I would probably
just use the definition of the mean:

sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$time)



Er, I think I meant to type:

fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
pred - survfit(fit)

sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$surv)
[1] 211.0943



(I continue to take effort to keep my postings in plain text despite
my mail-clients's efforts to match your formatted postings. It adds
to the work of responders when you post formatted questions and
responses.)



Thanks again,
Axel.



On Sat, Jul 5, 2014 at 1:54 PM, David Winsemius dwinsem...@comcast.net

wrote:


On Jul 5, 2014, at 5:28 AM, Axel Urbiz wrote:

Dear R users,

My apologies for the simple question, as I'm starting to learn the
concepts
behind the Cox PH model. I was just experimenting with the survival
and rms
packages for this.

I'm simply trying to obtain the expected survival time (as opposed
to the
probability of survival at a given time t).

What does expected survival time actually mean? Do you want the
median survival time?


I can't seem to find an option
from the type argument in the predict methods from
coxph{survival} or
cph{rms} that will give me expected survival times.

library(rms)
options(na.action=na.exclude) # retain NA in predictions
fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
fit2 -  cph(Surv(time, status) ~ age + ph.ecog, lung)
head(predict(fit,type=lp))
head(predict(fit2,type=lp))

`predict` will return the results of the regression, i.e. the log-
hazard ratios for each term in the RHS of the formula. What you
want (as described in the Index for the survival

Re: [R] Predictions from coxph or cph objects

2014-07-06 Thread Göran Broström

David and Axel,

I have two comments to your discussion:

(i) The area under the survival curve is equal to the mean of the 
distribution, so the estimate of the mean should be the sum of the areas 
of the rectangles defined by the estimated survival curve and the 
successive distances between observed event times.


Thus,

 surv - pred$surv
 time - pred$time
 sum(surv * diff(time))

should give you the (estimated) mean). (Note that time[1] == 0, and 
length(time) == length(surv) + 1)


(I do not think that David's suggestion gives the same answer, but I may 
be wrong.)


(ii) With censored data, this may be a bad idea. For instance, when the 
largest observation is a censoring time, you may badly underestimate the 
mean. Your best hope is to be able to estimate a conditional mean of the 
type E(T | T  x).


This is essentially a non-parametric situation, and therefore it is 
better to stick to medians and quantiles.


Göran Broström

On 2014-07-06 06:17, David Winsemius wrote:


On Jul 5, 2014, at 9:12 PM, David Winsemius wrote:



On Jul 5, 2014, at 12:43 PM, Axel Urbiz wrote:


Thank you David. It is my understanding that using survfirsurvit
below I get the median predicted survival. I actually was looking
for the mean. I can't seem to find in the documentation how to get
that.

options(na.action=na.exclude) # retain NA in predictions
fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
pred - survfit(fit, newdata=lung)
head(pred)


There might be a way. I don't know it if so, so I would probably
just use the definition of the mean:

sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$time)



Er, I think I meant to type:

fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
pred - survfit(fit)

   sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$surv)
[1] 211.0943



(I continue to take effort to keep my postings in plain text despite
my mail-clients's efforts to match your formatted postings. It adds
to the work of responders when you post formatted questions and
responses.)



Thanks again,
Axel.



On Sat, Jul 5, 2014 at 1:54 PM, David Winsemius dwinsem...@comcast.net

wrote:


On Jul 5, 2014, at 5:28 AM, Axel Urbiz wrote:

Dear R users,

My apologies for the simple question, as I'm starting to learn the
concepts
behind the Cox PH model. I was just experimenting with the survival
and rms
packages for this.

I'm simply trying to obtain the expected survival time (as opposed
to the
probability of survival at a given time t).

What does expected survival time actually mean? Do you want the
median survival time?


I can't seem to find an option
from the type argument in the predict methods from
coxph{survival} or
cph{rms} that will give me expected survival times.

library(rms)
options(na.action=na.exclude) # retain NA in predictions
fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
fit2 -  cph(Surv(time, status) ~ age + ph.ecog, lung)
head(predict(fit,type=lp))
head(predict(fit2,type=lp))

`predict` will return the results of the regression, i.e. the log-
hazard ratios for each term in the RHS of the formula. What you
want (as described in the Index for the survival package) is either
`survfit` or `survexp`.

require(survival)
help(pack=survival)
?survfit
?survexp
?summary.survfit
?quantile.survfit   # to get the median
?print.summary.survfit

require(rms)
help(pack=rms)

The rms-package also adds a `survfit.cph` function but I have found
the `survest` function also provides useful added features, beyond
those offered by survfit



Thank you.

Regards,
Axel.

[[alternative HTML version deleted]]

This is a plain text mailing list.

--

David Winsemius, MD
Alameda, CA, USA




David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predictions from coxph or cph objects

2014-07-06 Thread Göran Broström

On 2014-07-06 10:48, Göran Broström wrote:

David and Axel,

I have two comments to your discussion:

(i) The area under the survival curve is equal to the mean of the
distribution, so the estimate of the mean should be the sum of the areas
of the rectangles defined by the estimated survival curve and the
successive distances between observed event times.

Thus,

  surv - pred$surv
  time - pred$time
  sum(surv * diff(time))

should give you the (estimated) mean). (Note that time[1] == 0, and
length(time) == length(surv) + 1)


Well, this is not quite true; on the first interval the survival curve 
is one, so you need to


 surv - c(1, surv)

first. But then the lengths of the surv and time vectors do not match so 
you need to add a (large) time at the end of time. If the largest 
observation is an event, 'no problem' (surv is zero), but otherwise ...


Btw, I tried

 exit - rexp(10)
 event - rep(1, 10)
 fit - coxph(Surv(exit, event) ~ 1)

 survfit(fit)$surv
 [1] 0.90483742 0.80968410 0.71454371 0.61942215 0.52432953 0.42928471
 [7] 0.33432727 0.23955596 0.14529803 0.05345216

 survfit(Surv(exit, event) ~ 1)$surv
[1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

so be careful ...

Göran



(I do not think that David's suggestion gives the same answer, but I may
be wrong.)

(ii) With censored data, this may be a bad idea. For instance, when the
largest observation is a censoring time, you may badly underestimate the
mean. Your best hope is to be able to estimate a conditional mean of the
type E(T | T  x).

This is essentially a non-parametric situation, and therefore it is
better to stick to medians and quantiles.

Göran Broström

On 2014-07-06 06:17, David Winsemius wrote:


On Jul 5, 2014, at 9:12 PM, David Winsemius wrote:



On Jul 5, 2014, at 12:43 PM, Axel Urbiz wrote:


Thank you David. It is my understanding that using survfirsurvit
below I get the median predicted survival. I actually was looking
for the mean. I can't seem to find in the documentation how to get
that.

options(na.action=na.exclude) # retain NA in predictions
fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
pred - survfit(fit, newdata=lung)
head(pred)


There might be a way. I don't know it if so, so I would probably
just use the definition of the mean:

sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$time)



Er, I think I meant to type:

fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
pred - survfit(fit)

   sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$surv)
[1] 211.0943



(I continue to take effort to keep my postings in plain text despite
my mail-clients's efforts to match your formatted postings. It adds
to the work of responders when you post formatted questions and
responses.)



Thanks again,
Axel.



On Sat, Jul 5, 2014 at 1:54 PM, David Winsemius dwinsem...@comcast.net

wrote:


On Jul 5, 2014, at 5:28 AM, Axel Urbiz wrote:

Dear R users,

My apologies for the simple question, as I'm starting to learn the
concepts
behind the Cox PH model. I was just experimenting with the survival
and rms
packages for this.

I'm simply trying to obtain the expected survival time (as opposed
to the
probability of survival at a given time t).

What does expected survival time actually mean? Do you want the
median survival time?


I can't seem to find an option
from the type argument in the predict methods from
coxph{survival} or
cph{rms} that will give me expected survival times.

library(rms)
options(na.action=na.exclude) # retain NA in predictions
fit - coxph(Surv(time, status) ~ age + ph.ecog, lung)
fit2 -  cph(Surv(time, status) ~ age + ph.ecog, lung)
head(predict(fit,type=lp))
head(predict(fit2,type=lp))

`predict` will return the results of the regression, i.e. the log-
hazard ratios for each term in the RHS of the formula. What you
want (as described in the Index for the survival package) is either
`survfit` or `survexp`.

require(survival)
help(pack=survival)
?survfit
?survexp
?summary.survfit
?quantile.survfit   # to get the median
?print.summary.survfit

require(rms)
help(pack=rms)

The rms-package also adds a `survfit.cph` function but I have found
the `survest` function also provides useful added features, beyond
those offered by survfit



Thank you.

Regards,
Axel.

[[alternative HTML version deleted]]

This is a plain text mailing list.

--

David Winsemius, MD
Alameda, CA, USA




David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained

Re: [R] read a file of text with read.table

2014-06-26 Thread Göran Broström

Carol,

while sep= is the default, it really means 'whitespace', see the 
documentation of 'sep'.


Göran Broström

On 2014-06-26 11:21, carol white wrote:

Hi,

with read.fwf, it works.


But I still don't understand why it doesn't work with read.table
since the sep by default is , which is the case and in one trial, I
used  read.table(myfile,colClasses = character,
stringsAsFactors=FALSE, and stil didn't work but it should have.

Regards,



On Thursday, June 26, 2014 9:59 AM, Ron Crump
r.e.cr...@warwick.ac.uk wrote:



Hi Carol,

It might be a primitive question but I have a file of text and
there is no separator between character on each line and the
strings on each line have the same length. The format is like the
following

absfjdslf jfdldskjff jfsldfjslk

When I read the file with read.table(myfile,colClasses =
character), instead of putting the strings in a table of number
of rows x length of string, read.table saves the file in a table of
number of rows x 1 and each element seems to be a factor. Why does
read.table not account for  colClasses = character?

read.table relies on a separator to differentiate between columns, so
it is not appropriate for your file, read.fwf would do the job.

Setting colClasses (in my understanding) tells read.table how to
treat input as it comes in - so it disables some testing of data
types and makes reading quicker, it does not disable the setting of
character data to be factors, which is the default. You need to use
the stringsAsFactors=FALSE option for that.

So, for your example (and I have added a letter to the first row to
make it the same length as the others):

cf - absfjdslfx

jfdldskjff jfsldfjslk

cdf -
read.fwf(textConnection(cf),widths=rep(1,10),colClasses=character,stringsAsFactors=FALSE)

 See ?read.fwf for more information. A width is required for each
column (in this case 1 repeated 10 times).

Hope this helps.

Ron. [[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in coefficients in Cox proportional hazard estimates between R and Stata, why?

2014-05-30 Thread Göran Broström
In the Cox regression case, the probable explanation is that you have 
ties in your data; Stata and coxph may have different defaults for 
handling ties. Read the manuals!


The difference in sign in the other cases is simply due to different 
definitions of the models. I am sure it is well documented in relevant 
manuals.


Göran

On 2014-05-30 13:37, Hiyoshi, Ayako wrote:

Dear R users,



Hi, thank you so much for your help in advance.

I have been using Stata but new to R. For my paper revision using
Aalen's survival analysis, I need to use R, as the command including
Aalen's survival seems to be available in R (32-bit, version 3.1.0
(2014-04-10)) but less ready to be used in Stata (version 13/SE).



To make sure that I can do basics, I have fitted logistic regression
and Cox proportional hazard regression using R and Stata.



The data I used were from UCLA R's textbook example page:
http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm.
http://www.ats.ucla.edu/stat/r/examples/asa/asa_ch1_r.htm. I used
this in Stata too.



When I fitted logistic regression as below, the estimates were
exactly same between R and Stata.



Example using logistic regression

R:



logistic1 - glm(censor ~ age + drug, data=, family =
binomial)

summary(logistic1)

exp(cbind(OR=coef(logistic1), confint(logistic1)))

OR  2.5 %97.5 % (Intercept) 1.0373731 0.06358296 16.797896
age 1.0436805 0.96801933  1.131233 drug0.7192149
0.26042635  1.937502



Stata:



logistic censor age i.drug OR CI_lower CI_upper age |
1.043681   .96623881.127329 drug |.719215   .2665194
1.940835 _cons |   1.037373   .065847 16.3431



However, when I fitted Cox proportional hazard regression, there were
some discrepancies in coefficient (and exponentiated hazard ratios).



Example using Cox proportioanl hazard regression

R:



cox1 - coxph(Surv(time, censor) ~ drug, age, data=)
summary(cox1)

Call: coxph(formula = Surv(time, censor) ~ drug + age, data = )
n= 100, number of events= 80 coef exp(coef) se(coef) z Pr(|z|)
drug 1.01670   2.76405  0.25622 3.968 7.24e-05 *** age  0.09714
1.10202  0.01864 5.211 1.87e-07 *** --- Signif. codes:  0 '***' 0.001
'**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 exp(coef) exp(-coef) lower .95 upper
.95 drug 2.764 0.3618 1.673 4.567 age  1.102
0.9074 1.062 1.143 Concordance= 0.711  (se = 0.042 ) Rsquare=
0.324   (max possible= 0.997 ) Likelihood ratio test= 39.13  on 2 df,
p=3.182e-09 Wald test= 36.13  on 2 df,   p=1.431e-08
Score (logrank) test = 38.39  on 2 df,   p=4.602e-09

Stata:

stset time, f(censor) stcox drug age
--



_t | Haz. Ratio   Std. Err.  zP|z| [95% Conf. Interval]

-+



drug |   2.563531   .6550089 3.68   0.000  1.553634.229893

age |   1.095852 .02026 4.95   0.000 1.056854
1.136289
--





The HR estimates for drug was 2.76 from R, but 2.56 from Stata.

I searched in internet for explanation, but could not find any.



In parametric survival regression with exponential distribution, R
and Stata's coefficients were completely opposite while the values
were exactly same (i.e. say 0.08 for Stata and -0.08 for R). I
suspected something like this
(http://www.theanalysisfactor.com/ordinal-logistic-regression-mystery/)
going on, but for Cox proportional hazard regression, i coudl not
find any resource helping me.



I highly appreciate if anyone could explain this for me, or suggest
me resource that I can read.



Thank you so much for your help.



Best,

Ayako


[[alternative HTML version deleted]]

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Precedence and parentheses

2014-05-06 Thread Göran Broström
A thread on r-devel (Historical NA question) went (finally) off-topic, 
heading towards Precedence. This triggered a question that I think is 
better put on this list:


I have been more or less regularly been writing programs since the 
seventies (Fortran, later C) and I early got the habit of using 
parentheses almost everywhere, for two reasons. The first is the 
obvious, to avoid mistakes with precedences, but the second is almost as 
important: Readability.


Now, I think I have seen somewhere that unnecessary parentheses in  R 
functions may slow down execution time considerably. Is this really 
true, ant should I consequently get rid of my faiblesse for parentheses?

Or are there rules for when it matters and doesn't matter?

Thanks, Göran

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trend test for hazard ratios

2014-04-16 Thread Göran Broström

On 04/15/2014 10:51 PM, David Winsemius wrote:


On Apr 15, 2014, at 6:32 AM, Therneau, Terry M., Ph.D. wrote:


You can do statistical tests within a single model, for whether
portions of it fit or do not fit.  But one cannot take three
separate fits and compare them.  The program needs context to know
how the three relate to one another.  Say that group is your
strata variable, trt the variable of interest, and x1, x2 are
adjusters.

fit - coxph(Surv(time,status) ~ trt * strata(group) + x1 + x2,
data=mydata)

Will fit a model with a separate treatment coefficient for each of
the groups, and a separate baseline hazard for each.  One can now
create a contrast that corresponds to your trend test, using
vcov(fit) for the variance matrix and coef(fit) to retrieve the
coefficients.



I have at the moment on my workspace a dataset of breast cancer cases
extracted from SEER that has a factor representing three grades of
histology: $Grade.abb. $ Grade.abb : Factor w/ 3 levels
Well,Moderate, Poorly

It would be reasonable to test whether the grading has a trend in
its effect when controlling for other factors (and it would be
surprising to a medical audience if there were no effect.). So I
compare across models using  AgeStgSiz.NG.Rad as my linear trend
model (with one df for an `as.numeric` version, AgeStgSizRad as my
no-grade-included baseline, and AgeStgSiz.FG.Rad as my full factor
version:


David, your problem is different from the original one; I think you can 
solve yours by transforming the (unordered) factor to an ordered.


Try

AgeStgSiz.NG.Rad - coxph(Surv(Survmon,
Vital.status.recode..study.cutoff.used.==Dead) ~ AgeDx+
SEER.historic.stage.A+ as.numeric(Size.cat) + ordered(Grade.abb)
 + Rgrp , data=BrILR)

and contrasts based on orthogonal polynomials are used for Grade.abb

see ?contr.poly

Göran B.



AgeStgSiz.NG.Rad - coxph( Surv(Survmon,
Vital.status.recode..study.cutoff.used.==Dead) ~ AgeDx+
SEER.historic.stage.A+ as.numeric(Size.cat) + as.numeric(Grade.abb)
+ Rgrp , data=BrILR) AgeStgSizRad - coxph( Surv(Survmon,
Vital.status.recode..study.cutoff.used.==Dead) ~ AgeDx+
SEER.historic.stage.A+ as.numeric(Size.cat) + Rgrp , data=BrILR)
AgeStgSiz.FG.Rad - coxph( Surv(Survmon,
Vital.status.recode..study.cutoff.used.==Dead) ~ AgeDx+
SEER.historic.stage.A+ as.numeric(Size.cat) + Grade.abb + Rgrp ,
data=BrILR) 2*diff(summary(AgeStgSizRad)[['loglik']])

[1] 5166.291

2*diff(summary(AgeStgSiz.NG.Rad)[['loglik']])

[1] 5888.492

2*diff(summary(AgeStgSiz.FG.Rad)[['loglik']])

[1] 5889.379

So there is strong evidence that adding grade to the existing
covariates improves the fit but that representing as separate factor
values with one extra degree of freedom may not be needed. When I add
grade as a stratum I get a very different 2*loglikelihood:


AgeStgSiz.SG.Rad - coxph( Surv(Survmon,
Vital.status.recode..study.cutoff.used.==Dead) ~ AgeDx+
SEER.historic.stage.A+ as.numeric(Size.cat) + strata(Grade.abb) +
Rgrp , data=BrILR) 2*diff(summary(AgeStgSiz.SG.Rad)[['loglik']])

[1] 3980.908


dput( vcov(AgeStgSiz.SG.Rad) )

structure(c(9.00241385282728e-06, -4.45446264398645e-07,
5.18927440846587e-07, 2.62020260612094e-07, 7.47434378232446e-07,
-4.45446264398645e-07, 0.0046168537719431, 0.00445692601518848,
-8.67833275051278e-07, -3.68093395861629e-05, 5.18927440846587e-07,
0.00445692601518848, 0.00464685164887969, -1.61616621634903e-05,
-3.77256742079467e-05, 2.62020260612094e-07, -8.67833275051278e-07,
-1.61616621634903e-05, 7.84049821976807e-06, 1.8221575745622e-06,
7.47434378232446e-07, -3.68093395861629e-05, -3.77256742079467e-05,
1.8221575745622e-06, 0.000313989310316303), .Dim = c(5L, 5L),
.Dimnames = list(c(AgeDx, SEER.historic.stage.ALocalized,
SEER.historic.stage.ARegional, as.numeric(Size.cat), RgrpTRUE),
c(AgeDx, SEER.historic.stage.ALocalized,
SEER.historic.stage.ARegional, as.numeric(Size.cat), RgrpTRUE
)))

dput(coef(AgeStgSiz.SG.Rad))

structure(c(0.0393472050734995, 0.901971276489615, 1.46695753267995,
0.108860100649677, -0.273688779502084), .Names = c(AgeDx,
SEER.historic.stage.ALocalized, SEER.historic.stage.ARegional,
as.numeric(Size.cat), RgrpTRUE ))

I'm not particularly facile with contrast construction with var-covar
matrices, so hoping for a worked example. Also wondering of the
cross-model comparisons are invalid or less powerful?



Terry T.



On 04/15/2014 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I have the following problem. I stratified my patient cohort into
three ordered groups and performed multivariate adjusted Cox
regression analysis on each group separately. Now I would like to
calculate a p for trend across the hazard ratios that I got for
the three groups. How can I do that if I only have the HR and the
confidence interval? For example I got the following HRs for one
endpoint:

1.09(0.68-1.74),1.29(0.94-1.76) and 1.64(1.01-2.68).

There is a trend but how do I calculate if it is significant?

Best regards

Marcus Kleber




Re: [R] Trend test for hazard ratios

2014-04-15 Thread Göran Broström

On 04/15/2014 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I have the following problem. I stratified my patient cohort into three
ordered groups and performed multivariate adjusted Cox regression analysis
on each group separately. Now I would like to calculate a p for trend across
the hazard ratios that I got for the three groups. How can I do that if I
only have the HR and the confidence interval? For example I got the
following HRs for one endpoint:

1.09(0.68-1.74),1.29(0.94-1.76) and 1.64(1.01-2.68).

There is a trend but how do I calculate if it is significant?


You may treat the log of the estimates as independent observations from 
normal distributions with known (or estimated) variances, and find a 
suitable test statistic in the book


D.J. Bartholomew, R.E. Barlow, H.D. Brunk and J.M. Bremner (1972). 
Statistical Inference under Order Restrictions , Chichester: 
John Wiley and Sons.


Göran B.



Best regards

Marcus Kleber



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'rms' package error

2014-04-03 Thread Göran Broström



On 04/03/2014 04:29 AM, Lucy Leigh wrote:

Hi everyone,
I am attempting to use the R package 'rms'
http://biostat.mc.vanderbilt.edu/wiki/Main/Rrms
to implement a PH weibull model, using the pphsm() function.

However, I get the following error,
f.ph - pphsm(f)
Warning message:
In pphsm(f) :
   at present, pphsm does not return the correct covariance matrix

I tried simply running the example on page 117 of the manual, i.e.
set.seed(1)
S - Surv(runif(100))
x - runif(100)
dd - datadist(x); options(datadist='dd')
f - psm(S ~ x, dist=exponential)
summary(f) # effects on log(T) scale
f.ph - pphsm(f)
## Not run: summary(f.ph)

But I still got the above error message.
I have looked through the R help archives, and it appears that this question 
has been asked before in 2011, but
there were no replies.
http://r.789695.n4.nabble.com/HELP-td3494640.html

Does anyone know how to get this function to work? Or if there is an 
alternative package that can implement
a Weibull PH model?


The function 'phreg' in the package 'eha' fits parametric PH models to 
right-censored and left-truncated data for some baseline distributions, 
among them the Weibull.


Göran Broström


Cheers,
Lucy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulating from the Weibull with right censoring

2014-03-24 Thread Göran Broström



On 2014-03-24 07:22, David Winsemius wrote:


On Mar 23, 2014, at 9:14 PM, Lucy Leigh wrote:


Hi everyone,

I am currently attempting to simulate some survival data, based on
a Weibull model. I basically need to simulate some survival data so
that I can then test out a program I'm writing, before applying it
to some real data.

I've managed to create failure time data, using the rweibull
function. I.e. Y[i] - rweibull(1, shape, scale) For a variety of
shape and scale parameters etc.

I wish to add in right censoring. If I am considering my failure
times as deaths, and the censoring time as a study cut-off date
(for example, 5 years from baseline), would it be correct to do
something like,

for(i in 1:nSubjects){ if (Y[i]   5){ Y[i] - 5} else {Y[i]  -
Y[i] }}



This would be more R-ish if you did it this way (loop-free)

Y[ Y  5 ] - 5  # set cutoff
cens.time - runif(length(Y), 0, max(Y))
cens - cens.time  Y


Not really; Given survival times T and censoring times C ( == 5 in 
Lucy's example):


Y - pmin(Y, C)
event - (Y = C)

gives what Lucy wants, with Y being the censored times and 'event' 
indicating an observed death.


Göran


survreg( Surv(Y, !cens) ~ 1 )

Untested (well, lightly tested)  in absence of data object.


I guess my question is, is it statistically sound to impose the
right censoring after estimating the distribution with no
censoring? I am leaning towards yes, because in an ideal world
where we followed all participants until they died, the
distribution would be as estimated by the rweibull aboveand
assuming that the right censoring is independent, then cutting of
the measurements at a pre-specified time wouldn't affect the
outcome at all. But I guess I am just looking for a second
opinion?

Thanks in advance, this list has been immensely helpful with some
previous questions I have had about the survival package. Lucy


Please post in plain text.



[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulating from the Weibull with right censoring

2014-03-24 Thread Göran Broström

On 2014-03-24 09:34, Göran Broström wrote:



On 2014-03-24 07:22, David Winsemius wrote:


On Mar 23, 2014, at 9:14 PM, Lucy Leigh wrote:


Hi everyone,

I am currently attempting to simulate some survival data, based on
a Weibull model. I basically need to simulate some survival data so
that I can then test out a program I'm writing, before applying it
to some real data.

I've managed to create failure time data, using the rweibull
function. I.e. Y[i] - rweibull(1, shape, scale) For a variety of
shape and scale parameters etc.

I wish to add in right censoring. If I am considering my failure
times as deaths, and the censoring time as a study cut-off date
(for example, 5 years from baseline), would it be correct to do
something like,

for(i in 1:nSubjects){ if (Y[i]   5){ Y[i] - 5} else {Y[i]  -
Y[i] }}



This would be more R-ish if you did it this way (loop-free)

Y[ Y  5 ] - 5  # set cutoff
cens.time - runif(length(Y), 0, max(Y))
cens - cens.time  Y


Not really; Given survival times T and censoring times C ( == 5 in
Lucy's example):

Y - pmin(Y, C)
event - (Y = C)


Oops, should be

Y - pmin(T, C)
event - (T = C)

Göran



gives what Lucy wants, with Y being the censored times and 'event'
indicating an observed death.

Göran


survreg( Surv(Y, !cens) ~ 1 )

Untested (well, lightly tested)  in absence of data object.


I guess my question is, is it statistically sound to impose the
right censoring after estimating the distribution with no
censoring? I am leaning towards yes, because in an ideal world
where we followed all participants until they died, the
distribution would be as estimated by the rweibull aboveand
assuming that the right censoring is independent, then cutting of
the measurements at a pre-specified time wouldn't affect the
outcome at all. But I guess I am just looking for a second
opinion?

Thanks in advance, this list has been immensely helpful with some
previous questions I have had about the survival package. Lucy


Please post in plain text.



[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame vs. matrix

2014-03-17 Thread Göran Broström



On 2014-03-16 23:56, Duncan Murdoch wrote:

On 14-03-16 2:57 PM, Göran Broström wrote:

I have always known that matrices are faster than data frames, for
instance this function:


dumkoll - function(n = 1000, df = TRUE){
   dfr - data.frame(x = rnorm(n), y = rnorm(n))
   if (df){
   for (i in 2:NROW(dfr)){
   if (!(i %% 100)) cat(i = , i, \n)
   dfr$x[i] - dfr$x[i-1]
   }
   }else{
   dm - as.matrix(dfr)
   for (i in 2:NROW(dm)){
   if (!(i %% 100)) cat(i = , i, \n)
   dm[i, 1] - dm[i-1, 1]
   }
   dfr$x - dm[, 1]
   }
}


system.time(dumkoll())

  user  system elapsed
 0.046   0.000   0.045

system.time(dumkoll(df = FALSE))

  user  system elapsed
 0.007   0.000   0.008
--

OK, no big deal, but I stumbled over a data frame with one million
records. Then, with df = TRUE,

usersystem   elapsed
44677.141  1271.544 46016.754

This is around 12 hours.

With df = FALSE, it took only six seconds! About 7500 time faster.

I was really surprised by the huge difference, and I wonder if this is
to be expected, or if it is some peculiarity with my installation: I'm
running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.


I don't find it surprising.  The line

dfr$x[i] - dfr$x[i-1]

will be executed about a million times.  It does the following:


Thanks for the explanation; I got the idea that dfr[1, i] - might be 
faster than dfr$x[i] - , but it is in fact significantly slower.

Helpful experience.

Göran


1.  Get a pointer to the x element of dfr.  This requires R to look
through all the names of dfr to figure out which one is x.

2.  Extract the i-1 element from it.  Not particularly slow.

3.  Get a pointer to the x element of dfr again.  (R doesn't cache these
things.)

4.  Set the i element of it to a new value.  This could require the
entire column or even the entire dataframe to be copied, if R hasn't
kept track of the fact that it is really being changed in place.  In a
complex assignment like that, I wouldn't be surprised if that took
place.  (In the matrix equivalent, it would be easier to recognize that
it is safe to change the existing value.)

Luke Tierney is making some changes in R-devel that might help a lot in
cases like this, but I expect the matrix code will always be faster.

Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame vs. matrix

2014-03-17 Thread Göran Broström

On 2014-03-17 01:31, Jeff Newmiller wrote:

Did you really intend to make all of the x values the same?


Not at all; the code in the loop was in fact just nonsense. The point 
was to illustrate the huge difference in execution time. And that the 
relative difference seems to increase fast with the number of observations.



If so,
try one line instead of the for loop:

dfr$x[ 2:n ] - dfr$x[ 1 ]

If that was merely an error in your example, then you could use a
different one-liner:

dfr$x[ 2:n ] - dfr$x[ seq.int( n-1 ) ]

In either case, the speedup is considerable.


I know about all this, but sometimes you have situations where you 
cannot avoid an explicit loop.



I use data frames far more than matrices and don't feel I am
suffering for it, but then I also use creative indexing way more than
for loops.


I think that this example shows that you need both tools in your toolbox.

Göran



---



Jeff NewmillerThe .   .  Go Live...

DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
Go... Live:   OO#.. Dead: OO#..  Playing Research Engineer
(Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.
rocks...1k
---



Sent from my phone. Please excuse my brevity.


On March 16, 2014 11:57:33 AM PDT, Göran Broström
goran.brost...@umu.se wrote:

I have always known that matrices are faster than data frames,
for instance this function:


dumkoll - function(n = 1000, df = TRUE){ dfr - data.frame(x =
rnorm(n), y = rnorm(n)) if (df){ for (i in 2:NROW(dfr)){ if (!(i %%
100)) cat(i = , i, \n) dfr$x[i] - dfr$x[i-1] } }else{ dm -
as.matrix(dfr) for (i in 2:NROW(dm)){ if (!(i %% 100)) cat(i = ,
i, \n) dm[i, 1] - dm[i-1, 1] } dfr$x - dm[, 1] } }



system.time(dumkoll())


user  system elapsed 0.046   0.000   0.045


system.time(dumkoll(df = FALSE))


user  system elapsed 0.007   0.000   0.008 --

OK, no big deal, but I stumbled over a data frame with one million
records. Then, with df = TRUE,  user
system   elapsed 44677.141  1271.544 46016.754
 This is around 12 hours.

With df = FALSE, it took only six seconds! About 7500 time faster.

I was really surprised by the huge difference, and I wonder if this
is to be expected, or if it is some peculiarity with my
installation: I'm running Ubuntu 13.10 on a MacBook Pro with 8 Gb
memory, R-3.0.3.

Göran B.

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame vs. matrix

2014-03-17 Thread Göran Broström

On 2014-03-17 00:36, William Dunlap wrote:

Duncan's analysis suggests another way to do this:
extract the 'x' vector, operate on that vector in a loop,
then insert the result into the data.frame.


Thanks Bill, that is a good improvement.

Göran


 I added
a df=quicker option to your df argument and made the test
dataset deterministic so we could verify that the algorithms
do the same thing:

dumkoll - function(n = 1000, df = TRUE){
  dfr - data.frame(x = log(seq_len(n)), y = sqrt(seq_len(n)))
  if (identical(df, quicker)) {
  x - dfr$x
  for(i in 2:length(x)) {
  x[i] - x[i-1]
  }
  dfr$x - x
  } else if (df){
  for (i in 2:NROW(dfr)){
  # if (!(i %% 100)) cat(i = , i, \n)
  dfr$x[i] - dfr$x[i-1]
  }
  }else{
  dm - as.matrix(dfr)
  for (i in 2:NROW(dm)){
  # if (!(i %% 100)) cat(i = , i, \n)
  dm[i, 1] - dm[i-1, 1]
  }
  dfr$x - dm[, 1]
  }
  dfr
}

Timings for 10^4, 2*10^4, and 4*10^4 show that the time is quadratic
in n for the df=TRUE case and close to linear in the other cases, with
the new method taking about 60% the time of the matrix method:
 n - c(10k=1e4, 20k=2e4, 40k=4e4)
 sapply(n, function(n)system.time(dumkoll(n, df=FALSE))[1:3])
   10k  20k  40k
user.self 0.11 0.22 0.43
sys.self  0.02 0.00 0.00
elapsed   0.12 0.22 0.44
 sapply(n, function(n)system.time(dumkoll(n, df=TRUE))[1:3])
   10k   20k   40k
user.self 3.59 14.74 78.37
sys.self  0.00  0.11  0.16
elapsed   3.59 14.91 78.81
 sapply(n, function(n)system.time(dumkoll(n, df=quicker))[1:3])
   10k  20k  40k
user.self 0.06 0.12 0.26
sys.self  0.00 0.00 0.00
elapsed   0.07 0.13 0.27
I also timed the 2 faster cases for n=10^6 and the time still looks linear
in n, with vector approach still taking about 60% the time of the matrix
approach.
 system.time(dumkoll(n=10^6, df=FALSE))
   user  system elapsed
  11.650.12   11.82
 system.time(dumkoll(n=10^6, df=quicker))
   user  system elapsed
   6.790.086.91
The results from each method are identical:
 identical(dumkoll(100,df=FALSE), dumkoll(100,df=TRUE))
[1] TRUE
 identical(dumkoll(100,df=FALSE), dumkoll(100,df=quicker))
[1] TRUE

If your data.frame has columns of various types, then as.matrix will
coerce them all to a common type (often character), so it may give
you the wrong result in addition to being unnecessarily slow.

Bill Dunlap
TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Duncan Murdoch
Sent: Sunday, March 16, 2014 3:56 PM
To: Göran Broström; r-help@r-project.org
Subject: Re: [R] data frame vs. matrix

On 14-03-16 2:57 PM, Göran Broström wrote:

I have always known that matrices are faster than data frames, for
instance this function:


dumkoll - function(n = 1000, df = TRUE){
   dfr - data.frame(x = rnorm(n), y = rnorm(n))
   if (df){
   for (i in 2:NROW(dfr)){
   if (!(i %% 100)) cat(i = , i, \n)
   dfr$x[i] - dfr$x[i-1]
   }
   }else{
   dm - as.matrix(dfr)
   for (i in 2:NROW(dm)){
   if (!(i %% 100)) cat(i = , i, \n)
   dm[i, 1] - dm[i-1, 1]
   }
   dfr$x - dm[, 1]
   }
}


system.time(dumkoll())

  user  system elapsed
 0.046   0.000   0.045

system.time(dumkoll(df = FALSE))

  user  system elapsed
 0.007   0.000   0.008
--

OK, no big deal, but I stumbled over a data frame with one million
records. Then, with df = TRUE,

usersystem   elapsed
44677.141  1271.544 46016.754

This is around 12 hours.

With df = FALSE, it took only six seconds! About 7500 time faster.

I was really surprised by the huge difference, and I wonder if this is
to be expected, or if it is some peculiarity with my installation: I'm
running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.


I don't find it surprising.  The line

dfr$x[i] - dfr$x[i-1]

will be executed about a million times.  It does the following:

1.  Get a pointer to the x element of dfr.  This requires R to look
through all the names of dfr to figure out which one is x.

2.  Extract the i-1 element from it.  Not particularly slow.

3.  Get a pointer to the x element of dfr again.  (R doesn't cache these
things.)

4.  Set the i element of it to a new value.  This could require the
entire column or even the entire dataframe to be copied, if R hasn't
kept track of the fact that it is really being changed in place.  In a
complex assignment like that, I wouldn't be surprised if that took
place.  (In the matrix equivalent, it would be easier to recognize that
it is safe to change the existing

[R] data frame vs. matrix

2014-03-16 Thread Göran Broström
I have always known that matrices are faster than data frames, for 
instance this function:



dumkoll - function(n = 1000, df = TRUE){
dfr - data.frame(x = rnorm(n), y = rnorm(n))
if (df){
for (i in 2:NROW(dfr)){
if (!(i %% 100)) cat(i = , i, \n)
dfr$x[i] - dfr$x[i-1]
}
}else{
dm - as.matrix(dfr)
for (i in 2:NROW(dm)){
if (!(i %% 100)) cat(i = , i, \n)
dm[i, 1] - dm[i-1, 1]
}
dfr$x - dm[, 1]
}
}


 system.time(dumkoll())

   user  system elapsed
  0.046   0.000   0.045

 system.time(dumkoll(df = FALSE))

   user  system elapsed
  0.007   0.000   0.008
--

OK, no big deal, but I stumbled over a data frame with one million 
records. Then, with df = TRUE,


 usersystem   elapsed
44677.141  1271.544 46016.754

This is around 12 hours.

With df = FALSE, it took only six seconds! About 7500 time faster.

I was really surprised by the huge difference, and I wonder if this is 
to be expected, or if it is some peculiarity with my installation: I'm 
running Ubuntu 13.10 on a MacBook Pro with 8 Gb memory, R-3.0.3.


Göran B.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proportional weights

2014-02-06 Thread Göran Broström

On 05/02/14 22:40, Marco Inacio wrote:

Hello all, can help clarify something?

According to R's lm() doc:


Non-NULL weights can be used to indicate that different observations
have different variances (with the values in weights being inversely
*proportional* to the variances); or equivalently, when the elements
of weights are positive integers w_i, that each response y_i is the
mean of w_i unit-weight observations (including the case that there
are w_i observations equal to y_i and the data have been summarized).


Since the idea here is *proportion*, not equality, shouldn't the vectors
of weights x, 2*x give the same result? And yet they don't, standard
errors differs:


summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))$sigma

[1] 0.07108323

summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))$sigma

[1] 0.1005269


The weights are in fact case weights, i.e., a weight of 2 is the same as 
including the corresponding item twice. I agree that the documentation 
is no wonder of clarity in this respect.


Btw, note that, in your example, (0.1005269 / 0.07108323)^2 = 2, your 
constant weight.


Göran Broström



So what if I know a-priori, observation A has variance 2 times bigger
than observation B? Both weights=c(1,2) and weights=c(2,4) (and so on)
represent very well this knowledge, but we get different regression
(since sigma is different).


Also, if we do the same thing with a glm() model, than we get a lot of
other differences like in the deviance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proportional weights

2014-02-06 Thread Göran Broström

Dear John,

thanks for the clarification! The lesson to be learned is that one 
should be aware of the fact that weights may mean different things in 
different functions, and sometimes different things in the same function 
(glm)!


Göran

On 02/06/2014 02:17 PM, John Fox wrote:

Dear Marco and Goran,

Perhaps the documentation could be clearer, but it is after all a
brief help page. Using weights of 2 to lm() is *not* equivalent to
entering the observation twice. The weights are variance weights, not
case weights.

You can see this by looking at the whole summary() output for the
models, not just the residual standard errors:

- snip -


summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))


Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
weights = rep(1, 6))

Residuals: 123456 0.06477
-0.08728  0.07487 -0.03996  0.01746 -0.02986

Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)
-0.112080.08066   -1.390.237 c(1, 2.1, 2.9, 1.1, 2, 3)
1.047310.03732   28.07 9.59e-06 *** --- Signif. codes:  0 ‘***’
0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.07108 on 4 degrees of freedom Multiple
R-squared:  0.9949, Adjusted R-squared:  0.9937 F-statistic: 787.6 on
1 and 4 DF,  p-value: 9.59e-06


summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))


Call: lm(formula = c(1, 2, 3, 1, 2, 3) ~ c(1, 2.1, 2.9, 1.1, 2, 3),
weights = rep(2, 6))

Residuals: 123456 0.09160
-0.12343  0.10589 -0.05652  0.02469 -0.04223

Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)
-0.112080.08066   -1.390.237 c(1, 2.1, 2.9, 1.1, 2, 3)
1.047310.03732   28.07 9.59e-06 *** --- Signif. codes:  0 ‘***’
0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1005 on 4 degrees of freedom Multiple
R-squared:  0.9949, Adjusted R-squared:  0.9937 F-statistic: 787.6 on
1 and 4 DF,  p-value: 9.59e-06

- snip -

Notice that while the residual standard errors differ, the
coefficients and their standard errors are identical. There are
compensating changes in the residual variance and the weighted sum of
squares and products matrix for X.

In contrast, literally entering each observation twice reduces the
coefficient standard errors by a factor of sqrt((6 - 2)/(12 - 2)),
i.e., the square root of the relative residual df of the models:

- snip 


summary(lm(rep(c(1,2,3,1,2,3),2)~rep(c(1,2.1,2.9,1.1,2,3),2)))


Call: lm(formula = rep(c(1, 2, 3, 1, 2, 3), 2) ~ rep(c(1, 2.1, 2.9,
1.1, 2, 3), 2))

Residuals: Min1QMedian3Q   Max -0.087276
-0.039963 -0.006201  0.064768  0.074874

Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)
-0.112080.05101  -2.197   0.0527 rep(c(1, 2.1, 2.9, 1.1, 2, 3),
2)  1.047310.02360  44.374 8.12e-13

(Intercept)   . rep(c(1, 2.1, 2.9, 1.1, 2, 3), 2)
*** --- Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’
1

Residual standard error: 0.06358 on 10 degrees of freedom Multiple
R-squared:  0.9949, Adjusted R-squared:  0.9944 F-statistic:  1969 on
1 and 10 DF,  p-value: 8.122e-13

-- snip -

I hope this helps,

John

 John Fox, Professor
McMaster University Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/   On Thu, 6 Feb 2014 09:27:22 +0100
Göran Broström goran.brost...@umu.se wrote:

On 05/02/14 22:40, Marco Inacio wrote:

Hello all, can help clarify something?

According to R's lm() doc:


Non-NULL weights can be used to indicate that different
observations have different variances (with the values in
weights being inversely *proportional* to the variances); or
equivalently, when the elements of weights are positive
integers w_i, that each response y_i is the mean of w_i
unit-weight observations (including the case that there are w_i
observations equal to y_i and the data have been summarized).


Since the idea here is *proportion*, not equality, shouldn't the
vectors of weights x, 2*x give the same result? And yet they
don't, standard errors differs:


summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(1,6)))$sigma





[1] 0.07108323

summary(lm(c(1,2,3,1,2,3)~c(1,2.1,2.9,1.1,2,3),weight=rep(2,6)))$sigma





[1] 0.1005269


The weights are in fact case weights, i.e., a weight of 2 is the
same as including the corresponding item twice. I agree that the
documentation is no wonder of clarity in this respect.

Btw, note that, in your example, (0.1005269 / 0.07108323)^2 = 2,
your constant weight.

Göran Broström



So what if I know a-priori, observation A has variance 2 times
bigger than observation B? Both weights=c(1,2) and weights=c(2,4)
(and so on) represent very well this knowledge, but we get
different regression (since sigma is different).


Also, if we do the same thing with a glm() model, than we

[R] print.matrix

2014-01-30 Thread Göran Broström

In the documentation of 'prmatrix' (base) I read, under Details:

‘prmatrix’ is an earlier form of ‘print.matrix’

but 'print' doesn't seem to have a 'matrix' method. And in the 
'Examples' section:


chm - matrix(...
chm # uses print.matrix()

Is this a bug in the documentation?

R-3.0.2 on ubuntu 13.10.

Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] My problem with R

2014-01-24 Thread Göran Broström



On 1/24/14 10:44 PM, Rolf Turner wrote:

On 24/01/14 21:30, PIKAL Petr wrote:

SNIP


Why difference in the result? rbind and cbind function are something 
wrong?
Or
I, am something wrong?


you are something wrong.


Fortune?


No.

Göran



cheers,

Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type III tests and Cox models

2014-01-20 Thread Göran Broström

On 01/20/2014 07:02 PM, peter dalgaard wrote:


On 20 Jan 2014, at 18:47 , Terry Therneau thern...@mayo.edu wrote:


The short summary: I was suspicious before, now I know for certain
that it is misguided, and the phreg implementation of the idea is
worse.


A fortune candidate, if ever I saw one.


OK, but please state clearly that 'phreg' refers to a SAS procedure, not 
the function with the same name in the  R  package 'eha' (I should have 
chosen another name, had I ever heard of SAS).


Göran



-pd



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating parameters of 3 parameters lognormal distribution

2014-01-17 Thread Göran Broström



On 01/17/2014 08:42 AM, Vito Ricci wrote:

Hi Goran,

thanks for your suggestion, but I believe it's not helpful for me...

phreg statement Proportional hazards model with parametric baseline
hazard(s). Allows for stratification with dif-ferent scale and shape in
each stratum, and left truncated and right censored data

I've data whose distribution is lognormal with three parameters, I need
to fit this model and its 3 parameters, especially the the 3rd, the
theresold.


Right; the third parameter in the extension of the lognormal (and 
loglogistic) distribution in phreg is a 'proportional hazards' 
parameter, i.e., multiplying the standard lognormal (loglogistic) hazard 
function by a constant. If you want a threshold parameter, it is not for 
you.


Göran



Regards.
VR

Se non ora, quando?
Se non qui, dove?
Se non tu, chi?


Il Venerdì 17 Gennaio 2014 8:26, Vito Ricci vito_ri...@yahoo.com ha
scritto:

Many thanks for your suggestion.
Regards.
VR

Se non ora, quando?
Se non qui, dove?
Se non tu, chi?


Il Giovedì 16 Gennaio 2014 22:31, Göran Broström
goran.brost...@umu.se ha scritto:

On 01/16/2014 04:59 PM, Vito Ricci wrote:
  Hi guys,
 
  is there in some R package a statement to fit parameters in a
3 parameters lognormal distribution.

Yes, the function 'phreg' in the package 'eha'.

Göran Broström



  Many thanks
  Vito Ricci

  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
http://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible
code.

 






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating parameters of 3 parameters lognormal distribution

2014-01-16 Thread Göran Broström

On 01/16/2014 04:59 PM, Vito Ricci wrote:

Hi guys,

is there in some R package a statement to fit parameters in a 3 parameters 
lognormal distribution.


Yes, the function 'phreg' in the package 'eha'.

Göran Broström



Many thanks
Vito Ricci
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] seq_len and loops

2014-01-02 Thread Göran Broström

On 01/01/2014 07:36 PM, William Dunlap wrote:

2. However, Bill (and Henrik)  raised the question of replacing '1' with
'1L'; I understand the meaning of that, but does it matter (in practice)?
On 12/22/2013 06:57 PM, William Dunlap wrote:

for (i in seq_len(x - 1) + 1)

should be efficient and safe.


Oops, not safe when x is 0.


Also, the '+ 1' should be '+ 1L' to get the same answer as
seq_len(x)[-1].


It depends what your practice involves.

seq_len(n)[-1], 2:n, and seq_len(n-1)+1L all  produce an integer vector (if 
0n2^31 or so).
seq_len(n-1)+1 produces a numeric (double precision floating point) vector.

Integers and numerics have different properties which might affect your results,
but in many cases you will not care.   Integers use 4 bytes of memory, numerics 
8.
Integers have 32 bits of precision, numerics 52.  Integers range from -2^31+1 
to 2^31-1
and arithmetic which would give a result outside of that range results in NA 
(with a warning).
Numerics range from c. -2^1023 (c. -10^308) to c. 2^1023 (c. 10^308) and 
arithmetic
which would give a result outside of that range results in +-Inf.


Thanks Bill; I am aware of these details since my days as a successful 
Fortran programmer;)! I guess this background is what makes me worry 
about the apparently careless choice of variable type that I can get 
away with in  R.




If you prefer a sequence to be numeric, then use as.numeric(seq_len(n)), 
as.numeric(seq_len(n))[-1],
or seq_len(n)+1 when making it.  If you prefer integers, then use seq_len(n), 
seq_len(n)[-1], or seq_len(n)+1L.
If you don't care, do whatever seems easiest at the time.


if (x  1){
 for (x in 2:x){
...

is the easiest, most effective,  and most easy-to-understand.


The dangerous part of that idiom is what you do in the 'else' part of the 'if' 
statement.
Do both clauses make objects with the same names and types?  I mildly prefer 
avoiding
if statements because it makes reasoning about the results of the code more 
complicated.


I get your point; I however never have an 'else' part in this context 
(if x == 1 I'm already done).


Another observation: The 'while (x  n) x - x + 1' approach is very 
slow compared to a for loop.


Göran Broström



Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Göran Broström
Sent: Tuesday, December 31, 2013 4:10 PM
To: r-help@R-project.org
Subject: Re: [R] seq_len and loops

Thanks for the answers from Duncan, Bill, Gabor, and Henrik. You
convinced me that

1. The solution

if (x  1){
 for (x in 2:x){
...

is the easiest, most effective,  and most easy-to-understand.

2. However, Bill (and Henrik)  raised the question of replacing '1' with
'1L'; I understand the meaning of that, but does it matter (in practice)?

3. Noone commented upon

i - 1
while (i  x){
 i - i + 1
 ...
}

I suppose that it means that it is the best solution.

Thanks, and Happy New Year 2014!

Göran

On 12/22/2013 06:57 PM, William Dunlap wrote:

for (i in seq_len(x - 1) + 1)

should be efficient and safe.


Oops, not safe when x is 0.


Also, the '+ 1' should be '+ 1L' to get the same answer as
seq_len(x)[-1].

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On

Behalf

Of Duncan Murdoch
Sent: Saturday, December 21, 2013 3:52 PM
To: Göran Broström; R-help@r-project.org
Subject: Re: [R] seq_len and loops

On 13-12-21 6:50 PM, Duncan Murdoch wrote:

On 13-12-21 5:57 PM, Göran Broström wrote:

I was recently reminded on this list that

Using 1:ncol() is bad practice (seq_len is designed for that purpose)
(Ripley)

This triggers the following question: What is good practice for
2:ncol(x)? (This is not a joke; in a recursive situation it often makes
sense to perform the calculation for the start value i = 1, then
continue with a loop over the rest, the Fortran way;)

I usually use

if (ncol(x)  1)
 for (i in 2:ncol(x)){


but I can think of

for (i in seq_len(x - 1)){
 I - i + 1


and

i - 1
while (i  ncol(x)){
 i - i + 1
 

What is good practice (efficient and safe)?


for (i in seq_len(x - 1) + 1)

should be efficient and safe.


Oops, not safe when x is 0.

   
A little less efficient, but clearer would be


for (i in seq_len(x)[-1])

Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented

Re: [R] cumulative incidence for mstate in Survival package in R

2013-12-31 Thread Göran Broström

On 12/30/2013 11:04 PM, Jieyue Li wrote:

Dear All,

I want to have the cumulative incidence curves for 'mstate' data using
Survival package in R. But I got some problems:
I. Problem 1:
1. If I only use intercept without any covariates, I can have 'right'
cumulative incidence curves (2 for 2 competing risks):
library(Survival)


That shouldn't work;)


fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate) ~
1,data=mgus1, subset=(start==0))
plot(fitCI)
2. If I include one variate ('sex'), I get 4 curves (attached; I guess
because there are two levels in 'sex' and 2 competing risks):
fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate)
~sex,data=mgus1, subset=(start==0))
plot(fitCI)
However, I want to just have 2 cumulative incidence curves estimated from
several covariates (such as 'sex', 'age', 'alb', etc. in mgus1). Could you
please help me to do that? Thank you very much!


I suggest that you check the Task Views, under 'Survival' and 
'Multistate Models', for instance the 'cmprsk' and 'timereg' packages.



II. Problem 2:
I try using an example from sourcecode.pdf:
fit - survfit(Surv(time, status, type=’mstate’) ~ sex, data=mine)
but where can I have the 'mine' data? Thank you!


Where do you find 'sourcecode.pdf'?

Göran Broström



Best,

Jieyue



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumulative incidence for mstate in Survival package in R

2013-12-31 Thread Göran Broström



On 12/31/2013 09:05 PM, Jieyue Li wrote:

Thanks a lot for the reply!

On Tue, Dec 31, 2013 at 1:59 AM, Göran Broström goran.brost...@umu.se
mailto:goran.brost...@umu.se wrote:

On 12/30/2013 11:04 PM, Jieyue Li wrote:

Dear All,

I want to have the cumulative incidence curves for 'mstate' data
using
Survival package in R. But I got some problems:
I. Problem 1:
1. If I only use intercept without any covariates, I can have
'right'
cumulative incidence curves (2 for 2 competing risks):
library(Survival)


That shouldn't work;)

This is from an example from the Survival package...


Sorry; the problem is that 'survival' should be with lower-case 's'




fitCI - survfit(Surv(stop, status*as.numeric(event),
type=mstate) ~
1,data=mgus1, subset=(start==0))
plot(fitCI)
2. If I include one variate ('sex'), I get 4 curves (attached; I
guess
because there are two levels in 'sex' and 2 competing risks):
fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate)
~sex,data=mgus1, subset=(start==0))
plot(fitCI)
However, I want to just have 2 cumulative incidence curves
estimated from
several covariates (such as 'sex', 'age', 'alb', etc. in mgus1).
Could you
please help me to do that? Thank you very much!


I suggest that you check the Task Views, under 'Survival' and
'Multistate Models', for instance the 'cmprsk' and 'timereg' packages.


II. Problem 2:
I try using an example from sourcecode.pdf:
fit - survfit(Surv(time, status, type=’mstate’) ~ sex, data=mine)
but where can I have the 'mine' data? Thank you!


Where do you find 'sourcecode.pdf'?

It's from
http://stat.ethz.ch/R-manual/R-patched/library/survival/doc/sourcecode.pdf


I see; that is a generic example, I suppose. The data set 'mine' is one 
of your choice! But, as I suggested, look into the packages listed under 
'Multistate models' in 'Survival' under 'Task Views'. What you ask for 
cannot be accomplished with the package 'survival'.


Göran Broström



Göran Broström


Best,

Jieyue



R-help@r-project.org mailto:R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/__listinfo/r-help
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] seq_len and loops

2013-12-31 Thread Göran Broström
Thanks for the answers from Duncan, Bill, Gabor, and Henrik. You 
convinced me that


1. The solution

if (x  1){
   for (x in 2:x){
  ...

is the easiest, most effective,  and most easy-to-understand.

2. However, Bill (and Henrik)  raised the question of replacing '1' with 
'1L'; I understand the meaning of that, but does it matter (in practice)?


3. Noone commented upon

i - 1
while (i  x){
   i - i + 1
   ...
}

I suppose that it means that it is the best solution.

Thanks, and Happy New Year 2014!

Göran

On 12/22/2013 06:57 PM, William Dunlap wrote:

for (i in seq_len(x - 1) + 1)

should be efficient and safe.


Oops, not safe when x is 0.


Also, the '+ 1' should be '+ 1L' to get the same answer as
seq_len(x)[-1].

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Duncan Murdoch
Sent: Saturday, December 21, 2013 3:52 PM
To: Göran Broström; R-help@r-project.org
Subject: Re: [R] seq_len and loops

On 13-12-21 6:50 PM, Duncan Murdoch wrote:

On 13-12-21 5:57 PM, Göran Broström wrote:

I was recently reminded on this list that

Using 1:ncol() is bad practice (seq_len is designed for that purpose)
(Ripley)

This triggers the following question: What is good practice for
2:ncol(x)? (This is not a joke; in a recursive situation it often makes
sense to perform the calculation for the start value i = 1, then
continue with a loop over the rest, the Fortran way;)

I usually use

if (ncol(x)  1)
for (i in 2:ncol(x)){
   

but I can think of

for (i in seq_len(x - 1)){
I - i + 1
   

and

i - 1
while (i  ncol(x)){
i - i + 1


What is good practice (efficient and safe)?


for (i in seq_len(x - 1) + 1)

should be efficient and safe.


Oops, not safe when x is 0.

  
A little less efficient, but clearer would be


for (i in seq_len(x)[-1])

Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] seq_len and loops

2013-12-21 Thread Göran Broström

I was recently reminded on this list that

Using 1:ncol() is bad practice (seq_len is designed for that purpose) 
(Ripley)


This triggers the following question: What is good practice for 
2:ncol(x)? (This is not a joke; in a recursive situation it often makes 
sense to perform the calculation for the start value i = 1, then 
continue with a loop over the rest, the Fortran way;)


I usually use

if (ncol(x)  1)
for (i in 2:ncol(x)){
   

but I can think of

for (i in seq_len(x - 1)){
I - i + 1
   

and

i - 1
while (i  ncol(x)){
i - i + 1


What is good practice (efficient and safe)?

Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R- package for Parametric Survival Analysis with Left-censored data

2013-11-27 Thread Göran Broström



On 11/20/2013 08:17 AM, peter dalgaard wrote:


On 20 Nov 2013, at 04:15 , David Winsemius dwinsem...@comcast.net
wrote:



On Nov 19, 2013, at 5:30 PM, Vinod Mishra wrote:


Dear All,

I am new to R. Can someone please direct me to an R package using
which I can estimate a Parametric Survival Analysis model with
Left-censored (delayed entry) data in it.

I recently received reviewers comment on my submitted article,
where the reviewer suggested that only R has capabilities of
estimating above mentioned survival model. However, I am not able
to figure which specific package in R, the reviewer was referring
to.


Look at:

?Surv

... after loading the survival package. (I do think you would be
advised to seek statistical consultation.)



In particular, notice the difference between left censoring (some
patients are known to have died at an unknown time before t0) and
left truncation (we only know survival times for patients alive at
t0). Delayed entry is the latter.


And in that case you could try the functions 'phreg' (parametric 
proportional hazards models) and 'aftreg' (accelerated failure time 
models) in the package 'eha'. Both functions allow for left truncation, 
right censoring and and a choice of distributions.


Göran Broström




--

David Winsemius Alameda, CA, USA

__ R-help@r-project.org
mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to stop Kaplan-Meier curve at a time point

2013-11-26 Thread Göran Broström

The functions 'age.window' and 'cal.window' in the package 'eha' are
designed to perform 'rectangular cuts' in the Lexis diagram. So, in your
case,


require(eha)
dat - data.frame(enter = rep(0, length(Survival_days), exit =

Survival_days, event = Outcome)

dat.1 - age.window(dat, c(0, 2190))


and 'dat.1' is your right-censored data set.

Göran Broström

On 11/21/2013 07:36 AM, Thomas Stewart wrote:

One solution is to format the data as if the follow-up period ended on day
2190.  For example,

TTT - Survival_days
DDD - Outcome

DDD[ TTT2190 ] - 0
TTT[ TTT2190 ] - 2190

survfit(Surv(TTT, DDD) ~ Gender)

-tgs



On Wed, Nov 20, 2013 at 3:01 PM, Dr.Vinay Pitchika
dr.vinay@gmail.comwrote:


Hello R users

I have a question with Kaplan-Meier Curve with respect to my research. We
have done a retrospective study on fillings in the tooth and their survival
in relation to the many influencing factors. We had a long follow-up time
(upto 8yrs for some variables). However, we decided to stop the analysis at
the 6year follow up time, so that we can have uniform follow-up time for
all the variables.

I did play a bit with the formula and tried to stop the Kaplan-Meier curve
at my desired time (2190days)or roughly 6 years. However, my question is I
want to find the significance (log rank test) at this time point between
the two curves; because I am not able to find a way to stop the survfit at
this time point with my knowledge. Below is the code I used.

Gender2-survfit(Surv(Survival_days, Outcome)~Gender)

plot (Gender2, xmax=2190, mark.time=FALSE, col=c(1:2), xlab=Survival time
(Days), ylab=Survival probability, main=Gender) # mark.time=FALSE will
remove the censoring lines in the graph. If censoring lines are needed,
then remove the mark.time section in the formula.

legend(topright,c(Females, Males), col=(1:2), lwd=0.5)

Am sure, the code in the first line has to be modified. Please help me with
this and I will be very thankful to you.

Thanks in advance

Regards
Vinay

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain restricted estimates from coxph()?

2013-10-16 Thread Göran Broström



On 2013-10-16 14:33, Terry Therneau wrote:



On 10/16/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I'm trying to use coxph() function to fit a very simple Cox proportional
hazards regression model (only one covariate) but the parameter space is
restricted to an open set (0, 1). Can I still obtain a valid estimate by
using coxph function in this scenario? If yes, how? Any suggestion would be
greatly appreciated. Thanks!!!


Easily:
 1.  Fit the unrestricted model.  If the solution is in 0-1 you are done.
 2.  If it is outside, fix the coefficient.  Say that the solution is 1.73, 
then the
optimal solution under contraint is 1.


OK, except for the small annoyance that 1 is not a member of the open 
set (interval) (0, 1). Maybe the answer is No in this case? Depends on 
what lies in the word 'valid'. If 'MLE', the answer is No.



 Redo the fit adding the paramters  init=1, iter=0.  This forces the 
program to
give the loglik and etc for the fixed coefficient of 1.0.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-08-16 Thread Göran Broström

Sorry I'm late with this.

On 07/26/2013 02:02 PM, Terry Therneau wrote:

Two choices. If this were a linear model, do you like the GEE
approach or a mixed effects approach? Assume that subject is a
variable containing a per-subject identifier.

GEE approach: add + cluster(subject) to the model statement in
coxph Mixed models approach: Add  + (1|subject) to the model
statment in coxme.


Note that the 'cluster' approach ignores the clustering regarding the 
regression parameter estimates. It tries to correct the optimistic 
variance estimate given by ignoring the clustering, but it does nothing 
about the bias that may be introduced.



When only a very few subjects have multiple events, the mixed model
(random effect) approach may not be reliable, however.  Multiple
events per group are the fuel for estimation of the variance of the
random effect, and with few of these the profile likelihood of the
random effect will be very flat.  You can get esssentially a random
estimate of the variance of the subject effect.  I'm still getting
my arms around this issue, and it has taken me a long time.


John had exactly two observations per subject, and given that a frailty 
model is reasonable, the bias may be substantial if ignoring it. I made 
a small simulation study to convince myself: frailty variance = 1, one 
binary covariate (constant within subjects) and beta coefficient = 1. 
With 20 subjects, the bias for coxme was -0.004, for coxph (with 
'cluster', but it doesn't matter) -0.294 (based on 1000 replicates). 
(The bias for the frailty standard deviation was -0.108, but who cares 
when we regard it as just a nuisance?)


Of course this doesn't prove anything, but it makes me worried; it is 
easy to understand the frailty model, but what is the 'GEE' model in 
this survival case? Why should it be used in John's case?



Frailty is an alternate label for random effects when all we have
is a random intercept.  Multiple labels for the same idea adds
confusion, but nothing else.


The term frailty was (to my knowledge) coined by Vaupel, Manton  
Stallard in a 1979 paper in 'Demography'. They used it to describe 
heterogeneity in demographic data, and what could happen if it was 
ignored. Just for the record.


Göran


Terry Therneau

On 07/25/2013 08:14 PM, Marc Schwartz wrote:

On Jul 25, 2013, at 4:45 PM, David
Winsemiusdwinsem...@comcast.net  wrote:


On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:


On Jul 25, 2013, at 2:11 PM, John
Sorkinjsor...@grecc.umaryland.edu  wrote:


Colleagues, Is there any R package that will allow one to
perform a repeated measures Cox Proportional Hazards
regression? I don't think coxph is set up to handle this type
of problem, but I would be happy to know that I am not
correct. I am doing a study of time to hip joint replacement.
As each person has two hips, a given person can appear in the
dataset twice, once for the left hip and once for the right
hip, and I need to account for the correlation of data from a
single individual. Thank you, John



John,

See Terry's 'coxme' package:

http://cran.r-project.org/web/packages/coxme/index.html


When I looked over the description of coxme, I was concerned it
was not really designed with this in mind. Looking at Therneau
and Grambsch, I thought section 8.4.2 in the 'Multiple Events per
Subject' Chapter fit the analysis question well. There they
compared the use of coxph( ...+cluster(ID),,...)  withcoxph(
...+strata(ID),,...). Unfortunately I could not tell for sure
which one was being described as superio but I think it was the
cluster() alternative. I seem to remember there are discussions
in the archives.


David,

I think that you raise a good point. The example in the book (I had
to wait to get home to read it) is potentially different however,
in that the subject's eye's were randomized to treatment or
control, which would seem to suggest comparable baseline
characteristics for each pair of eyes, as well as an active
intervention on one side where a difference in treatment effect
between each eye is being analyzed.

It is not clear from John's description above if there is one hip
that will be treated versus one as a control and whether the extent
of disease at baseline is similar in each pair of hips. Presumably
the timing of hip replacements will be staggered at some level,
even if there is comparable disease, simply due to post-op recovery
time and surgical risk. In cases where the disease between each hip
is materially different, that would be another factor to consider,
however I would defer to orthopaedic physicians/surgeons from a
subject matter expertise consideration. It is possible that the
bilateral hip replacement data might be more of a parallel to
bilateral breast cancer data, if each breast were to be tracked
separately.

I have cc'd Terry here, hoping that he might jump in and offer some
insights into the pros/cons of using coxme versus coxph with either
a cluster or strata based approach, or perhaps even a 

Re: [R] coxph diagnostics

2013-08-11 Thread Göran Broström



On 08/11/2013 06:14 AM, Soumitro Dey wrote:

Hello all,

This may be a naive question but since I'm new to R/survival models, I
cannot figure it out the problem myself.

I have a coxph model for my data and I am trying to test if the
proportional hazards assumption holds. Using cox.zph on the model I get a
global score:

GLOBAL  NA 4.20e+02 0.00e+00

Does this mean that the proportional hazard assumption does not hold?


Yes, or, the fit is very bad (see Bert's response).


When I plot the  Schoenfeld residuals, generally the plots are across
the horizontal line which makes me think that the proportional hazards
assumption still holds. Could someone please clarify on this?


Did you try

 plot(cox.zph(fit))

Read the help pages for cox.zph and plot.cox.zph. The raw Schoenfeld 
residuals plots are generally of limited value.


With factor covariate(s) you could also perform a graphical inspection 
by plotting the estimated cumulative hazards, stratifying on these factors.


A somewhat unrelated question: I have come across several papers which
just calculate the coxph model without the diagnostics for
proportional hazards assumption and interpret the results of the
regression directly. Should that be acceptable?


No.


Are there other ways
to show the goodness of the model?


Yes, see above.

Göran

 Thanks!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-07-26 Thread Göran Broström

On 07/26/2013 04:06 AM, John Sorkin wrote:

David Thank you for your thoughts. The data I am analyzing do not
come from a clinical trial but rather from a cohort study whose aim
is to determine risk factors for surgical therapy to treat their
joints. John


As David explained, there are several ways of approaching this 
situation. If it is a treatment/control case (which yours isn't), 
stratification is appropriate. It boils down to a simple sign test 
where we compare the number of pairs with longest survival of the 
treated with the number of pairs with the longest survival of the 
control. Undetermined (due to censoring) pairs are thrown away.


Generally, stratification is an alternative to the frailty model, but it 
has some drawbacks: loss of power (especially with small stratum sizes), 
and you cannot use covariates that are constant within pairs (personal 
characteristics in your case). The frailty model comes with stronger 
assumptions than stratification, but you avoid the drawbacks just 
mentioned. The clustering method, finally, is for variance correction in 
the ordinary Cox regression.


In your case, would recommend the frailty approach with coxme (while we 
wait for Terry's verdict!).


Göran


Sent from my iPhone

On Jul 25, 2013, at 9:15 PM, Marc Schwartz marc_schwa...@me.com
marc_schwa...@me.com wrote:



On Jul 25, 2013, at 4:45 PM, David Winsemius
dwinsem...@comcast.net wrote:



On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:



On Jul 25, 2013, at 2:11 PM, John Sorkin
jsor...@grecc.umaryland.edu wrote:


Colleagues, Is there any R package that will allow one to
perform a repeated measures Cox Proportional Hazards
regression? I don't think coxph is set up to handle this type
of problem, but I would be happy to know that I am not
correct. I am doing a study of time to hip joint replacement.
As each person has two hips, a given person can appear in the
dataset twice, once for the left hip and once for the right
hip, and I need to account for the correlation of data from a
single individual. Thank you, John




John,

See Terry's 'coxme' package:

http://cran.r-project.org/web/packages/coxme/index.html


When I looked over the description of coxme, I was concerned it
was not really designed with this in mind. Looking at Therneau
and Grambsch, I thought section 8.4.2 in the 'Multiple Events per
Subject' Chapter fit the analysis question well. There they
compared the use of coxph( ...+cluster(ID),,...)  withcoxph(
...+strata(ID),,...). Unfortunately I could not tell for sure
which one was being described as superio but I think it was the
cluster() alternative. I seem to remember there are discussions
in the archives.



David,

I think that you raise a good point. The example in the book (I had
to wait to get home to read it) is potentially different however,
in that the subject's eye's were randomized to treatment or
control, which would seem to suggest comparable baseline
characteristics for each pair of eyes, as well as an active
intervention on one side where a difference in treatment effect
between each eye is being analyzed.

It is not clear from John's description above if there is one hip
that will be treated versus one as a control and whether the extent
of disease at baseline is similar in each pair of hips. Presumably
the timing of hip replacements will be staggered at some level,
even if there is comparable disease, simply due to post-op recovery
time and surgical risk. In cases where the disease between each hip
is materially different, that would be another factor to consider,
however I would defer to orthopaedic physicians/surgeons from a
subject matter expertise consideration. It is possible that the
bilateral hip replacement data might be more of a parallel to
bilateral breast cancer data, if each breast were to be tracked
separately.

I have cc'd Terry here, hoping that he might jump in and offer some
insights into the pros/cons of using coxme versus coxph with either
a cluster or strata based approach, or perhaps even a frailty based
approach as in 9.4.1 in the book.

Regards,

Marc




-- David.


You also might find the following of interest:

http://bjo.bmj.com/content/71/9/645.full.pdf

http://www.ncbi.nlm.nih.gov/pubmed/6885

http://www.ncbi.nlm.nih.gov/pubmed/22078901



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.


David Winsemius Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
posting guide http://www.R-project.org/posting-guide.html and
provide commented, minimal, self-contained, reproducible code.





Confidentiality Statement: This email message, including any
attachments, is for 

Re: [R] Windows 7 (Swedish): 'Rcmd INSTALL' fails (SOLVED)

2013-07-21 Thread Göran Broström

Uwe, thanks!

The switch from Program to Program Files made it (stupid Windows!).

I tried the --library=. flag because I got the error message

path[2]=C:/Program/R/R-3.0.1/library: Access denied

without it, and naively thought that I had a permission problem.
This was strange, because I am the administrator of my Windows machines.

BTW, why can't I, as an administrator, install packages in

C:\Program Files\R\R-3.0.1\library ?

This is of course a very minor problem, since I never do anything useful 
on Windows machines.


Best,

Göran Broström

Uwe Ligges skrev 2013-07-21 00:51:



On 21.07.2013 00:26, Göran Broström wrote:

I am trying to build a Windows zip file of a private package by

C:/Program/R/R-3.0.1/bin/x64/Rcmd INSTALL --build --library=.
epigen_0.1.tar.gz


Why --library=. ?
Do you have appropriate poermissions to do that? In this case, you need
to start everything with admin priviliges explicitly unless your Windows
is configured differently.




but I get errors like

1: package 'datasets' in options(defaultPackages) was not found

and, finally,

Error in normalsizePath(path.expand(path), winslash, mustWork) :
path[1]=C:/Program/R/R-3.0.1/library/tools: Atkomst nekad
(Access denied)

This is on a Swedish version of Windows 7. I noticed that the R
installer called the installation directory Program Files instead of
Program (despite my effort to change it), so I tried the same thing


But there is Program Files, Program is just something like a link
shown by the Windows Explorer that actually points to Program Files.




with an English version of Windows 7 (on another machine), and there
everything went as expected.

I installed R on both machines today so the setups should be identical,
except for the languages. What can be wrong? Can it be a language/path
thing?


Probably not the language but different admin settings.

Best,
Uwe Ligges



Thanks for any insight!

Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Windows 7 (Swedish): 'Rcmd INSTALL' fails (SOLVED)

2013-07-21 Thread Göran Broström



On 07/21/2013 12:27 PM, Uwe Ligges wrote:



On 21.07.2013 10:50, Göran Broström wrote:

Uwe, thanks!

The switch from Program to Program Files made it (stupid Windows!).

I tried the --library=. flag because I got the error message

path[2]=C:/Program/R/R-3.0.1/library: Access denied

without it, and naively thought that I had a permission problem.
This was strange, because I am the administrator of my Windows machines.

BTW, why can't I, as an administrator, install packages in

C:\Program Files\R\R-3.0.1\library ?



Because you started R with standard privileges? Right click on R and
then tell Windows to start R as an administrator. That should fix your
problems.


Thanks again; didn't know that either! Always fun to learn new things.

Best,

Göran Broström



Best,
Uwe Ligges


This is of course a very minor problem, since I never do anything useful
on Windows machines.

Best,

Göran Broström

Uwe Ligges skrev 2013-07-21 00:51:



On 21.07.2013 00:26, Göran Broström wrote:

I am trying to build a Windows zip file of a private package by

C:/Program/R/R-3.0.1/bin/x64/Rcmd INSTALL --build --library=.
epigen_0.1.tar.gz


Why --library=. ?
Do you have appropriate poermissions to do that? In this case, you need
to start everything with admin priviliges explicitly unless your Windows
is configured differently.




but I get errors like

1: package 'datasets' in options(defaultPackages) was not found

and, finally,

Error in normalsizePath(path.expand(path), winslash, mustWork) :
path[1]=C:/Program/R/R-3.0.1/library/tools: Atkomst nekad
(Access denied)

This is on a Swedish version of Windows 7. I noticed that the R
installer called the installation directory Program Files instead of
Program (despite my effort to change it), so I tried the same thing


But there is Program Files, Program is just something like a link
shown by the Windows Explorer that actually points to Program Files.




with an English version of Windows 7 (on another machine), and there
everything went as expected.

I installed R on both machines today so the setups should be identical,
except for the languages. What can be wrong? Can it be a language/path
thing?


Probably not the language but different admin settings.

Best,
Uwe Ligges



Thanks for any insight!

Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Windows 7 (Swedish): 'Rcmd INSTALL' fails

2013-07-20 Thread Göran Broström

I am trying to build a Windows zip file of a private package by

C:/Program/R/R-3.0.1/bin/x64/Rcmd INSTALL --build --library=. 
epigen_0.1.tar.gz


but I get errors like

1: package 'datasets' in options(defaultPackages) was not found

and, finally,

Error in normalsizePath(path.expand(path), winslash, mustWork) :
path[1]=C:/Program/R/R-3.0.1/library/tools: Atkomst nekad
(Access denied)

This is on a Swedish version of Windows 7. I noticed that the R 
installer called the installation directory Program Files instead of 
Program (despite my effort to change it), so I tried the same thing 
with an English version of Windows 7 (on another machine), and there 
everything went as expected.


I installed R on both machines today so the setups should be identical, 
except for the languages. What can be wrong? Can it be a language/path 
thing?


Thanks for any insight!

Göran Broström

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on indicator variables

2013-03-21 Thread Göran Broström

On 03/21/2013 02:08 PM, Jorge I Velez wrote:

Try

ifelse(ABS ==1 | DEFF == 1, 1, 0)


or

as.numeric(ABS | DEFF)

(faster?)

Göran



HTH,
Jorge.-



On Fri, Mar 22, 2013 at 12:02 AM, Tasnuva Tabassum t.tasn...@gmail.comwrote:


I have two indicator variables ABS and DEFF. I want to create another
indicator variable which will take value 1 if either ABS=1 or DEFF=1.
Otherwise, it will take value 0. How can I make that?

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >