[R] extracting characters from a string

2013-01-23 Thread Biau David
Dear All,

I have a data frame of vectors of publication names such as 'pub':

pub1 - c('Brown DK, Santos R, Rome DF, Don Juan X')
pub2 - c('Benigni D')
pub3 - c('Arstra SD, Van den Hoops DD, lamarque D')

pub - rbind(pub1, pub2, pub3)


I would like to construct a dataframe with only author's last name and each 
last name in columns and the publication in rows. Basically I want to get rid 
of the initials (max 2, always before a comma) and spaces surounding last name. 
I would like to avoid a loop.

ps: If I could have even a short explanation of the code that extract the 
values of the character string that would also be great!

 
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting characters from a string

2013-01-23 Thread Biau David
thanks, it works well. I have to work on Arun's previous answer to make it work 
too.


 
David



 De : Rui Barradas ruipbarra...@sapo.pt
À : Biau David djmb...@yahoo.fr 
Cc : r help list r-help@r-project.org 
Envoyé le : Mercredi 23 janvier 2013 19h57
Objet : Re: [R] extracting characters from a string
 
Hello,

I've just noticed that my first solution would only return the first set 
of alphabetic characters, such as Van, not Van den Hoops.
The following will solve that problem.


fun2 - function(x, sep = , ){
    x - strsplit(x, sep)
    m - lapply(x, function(y) gregexpr( [[:alpha:]]*$, y))
    res - lapply(seq_along(x), function(i)
        regmatches(x[[i]], m[[i]], invert = TRUE))
    res - lapply(res, unlist)
    lapply(res, function(y) y[nchar(y)  0])
}
fun2(pub)


Hope this helps,

Rui Barradas

Em 23-01-2013 18:33, Rui Barradas escreveu:
 Hello,

 Try the following.

 fun - function(x, sep = , ){
      s - unlist(strsplit(x, sep))
      regmatches(s, regexpr([[:alpha:]]*, s))
 }

 fun(pub)


 Hope this helps,

 Rui Barradas

 Em 23-01-2013 17:38, Biau David escreveu:
 Dear All,

 I have a data frame of vectors of publication names such as 'pub':

 pub1 - c('Brown DK, Santos R, Rome DF, Don Juan X')
 pub2 - c('Benigni D')
 pub3 - c('Arstra SD, Van den Hoops DD, lamarque D')

 pub - rbind(pub1, pub2, pub3)


 I would like to construct a dataframe with only author's last name and
 each last name in columns and the publication in rows. Basically I
 want to get rid of the initials (max 2, always before a comma) and
 spaces surounding last name. I would like to avoid a loop.

 ps: If I could have even a short explanation of the code that extract
 the values of the character string that would also be great!


 David

     [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing loops from code in making data.frame

2013-01-16 Thread Biau David
thanks, it goes a lot faster. Just one thing though, when I apply the code to 
my data, both data.frames end up differente. Or at least identical(df1, df2) 
if false

however when i do which(df1!=df2) it says 'integer (0)'.

Could that be due to the class of the vectors or some thing of the sort?

thanks,


 
David Biau



 De : arun smartpink...@yahoo.com
À : Biau David djmb...@yahoo.fr 
Cc : R help r-help@r-project.org 
Envoyé le : Mardi 15 janvier 2013 21h54
Objet : Re: [R] removing loops from code in making data.frame
 
Hi,

You could also do this:
res1-do.call(rbind,lapply(xaulist,function(x) 
as.numeric(apply(t(mapply(`==`,tata,x)),2,any
identical(res1,tutu)
#[1] TRUE
A.K.





- Original Message -
From: Biau David djmb...@yahoo.fr
To: r help list r-help@r-project.org
Cc: 
Sent: Tuesday, January 15, 2013 2:41 PM
Subject: [R] removing loops from code in making data.frame

Dear all,

I am working on an author network and to do so I have to arrange a data.frame 
(tutu) crossing author names (rows) per publication number (column). The 
participation of the author to a study is indicated by a 1 and 0 otherwise.

I have a vector (xaulist) of all the names of authors and a data.frame (tata) 
with all the publications in row and the authors in columns. I have writen a 
loop to obtain my data.frame but it takes a long time when the number of 
studies increases. I was looking for a more efficient code.

Here is a minimal working example (my code is terrible i know...):

#-

au1 - c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 'deb')
au2 - c('art', 'deb', 'soy', 'deb', 'joy', 'ani', 'deb', 'deb', 'nem', 'mar')
au3 - c('mar', 'lio', 'mil', 'mar', 'ani', 'lul', 'nem', 'art', 'deb', 'tat')

tata - data.frame(au1, au2, au3)
xaulist2 - levels(factor(unlist(tata[,])))
xaulist - levels(as.factor(xaulist2))

tutu - matrix(NA, nrow=length(xaulist), ncol=dim(tata)[1]) # row are authors 
and col are papers
for (i in 1:length(xaulist))
{
  for (j in 1:dim(tata)[1])
  {
  ifelse('TRUE' %in% as.character(tata[j,]==xaulist[i]), tutu[i,j] - 1,  
tutu[i,j] - 0)
  }
}
tutu[is.na(tutu)] - 0

#-

I am looking at some more efficient way to build 'tutu'.

Thank you very much,

 
David

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] removing loops from code in making data.frame

2013-01-15 Thread Biau David
Dear all,

I am working on an author network and to do so I have to arrange a data.frame 
(tutu) crossing author names (rows) per publication number (column). The 
participation of the author to a study is indicated by a 1 and 0 otherwise.

I have a vector (xaulist) of all the names of authors and a data.frame (tata) 
with all the publications in row and the authors in columns. I have writen a 
loop to obtain my data.frame but it takes a long time when the number of 
studies increases. I was looking for a more efficient code.

Here is a minimal working example (my code is terrible i know...):

#-

au1 - c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 'deb')
au2 - c('art', 'deb', 'soy', 'deb', 'joy', 'ani', 'deb', 'deb', 'nem', 'mar')
au3 - c('mar', 'lio', 'mil', 'mar', 'ani', 'lul', 'nem', 'art', 'deb', 'tat')

tata - data.frame(au1, au2, au3)
xaulist2 - levels(factor(unlist(tata[,])))
xaulist - levels(as.factor(xaulist2))

tutu - matrix(NA, nrow=length(xaulist), ncol=dim(tata)[1]) # row are authors 
and col are papers
for (i in 1:length(xaulist))
{
  for (j in 1:dim(tata)[1])
  {
  ifelse('TRUE' %in% as.character(tata[j,]==xaulist[i]), tutu[i,j] - 1,  
tutu[i,j] - 0)
  }
}
tutu[is.na(tutu)] - 0

#-

I am looking at some more efficient way to build 'tutu'.

Thank you very much,

 
David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting character values

2013-01-13 Thread Biau David
Dear all,

I have a dataframe of names (netw), with each cell including last name and 
initials of an author; some cells have NA. I would like to extract only the 
last name from each cell; this new dataframe is calle 'res'


Here is what I do:

res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:x)
{
wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] - substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}

 
the problem is that I cannot manage to extract 'complex' names properly such as 
' van der hoops bf  ': here I only get 'van', the real last name is 'van der 
hoops' and 'bf' are the initials. Basically the last name has always a minimum 
of 3 consecutive letters, but may have 3 or more letters separated by one or 
more space; the cell may start by a space too; initials never have more than 2 
letters.

Someone would have a nice idea for that? Thanks,


David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count combined occurrences of categories

2013-01-13 Thread Biau David
OK thanks for the tips. I have abandonned the use of cbidn in dataframe. I've 
used the obth dcast() and melt() and they both work fine. Thanks again

 
David Biau



 De : David Winsemius dwinsem...@comcast.net
À : arun smartpink...@yahoo.com 
Cc : R help r-help@r-project.org; Biau David djmb...@yahoo.fr 
Envoyé le : Vendredi 11 janvier 2013 18h54
Objet : Re: [R] count combined occurrences of categories
 

On Jan 11, 2013, at 9:47 AM, arun wrote:

 HI David,
 
 I get different results with dcast()
 
 library(reshape2)
   dcast(melt(tutu,nam),nam~value,length)
 #  nam art deb joy mar seb lio nem tat
 #1  da   2   3   1   4   1   1   0   0
 #2  fr   2   2   2   3   0   1   1   1
 #3  ya   1   2   1   0   0   1   1   0
 
  tutus - data.frame(nam=tutu$nam, au=with(tutu, c(au1,au2,au3)))
  with(tutus,table(nam,au))
 #    au
 #nam  1 2 3 4 5 6 7
  # da 2 3 1 2 4 0 0   #some numbers don't match the previous result
   #fr 2 2 2 2 2 1 1
   #ya 1 2 1 1 0 1 0
 #If I convert to as.character(), it matched with the dcast() results

Probably due to the fact I used c() on factors:

tutu - data.frame(nam, au1, au2, au3, stringsAsFactors=FALSE)
 tutus - data.frame(nam=tutu$nam, au=with(tutu, c(au1,au2,au3)))
 tutab - with(tutus, table(nam, au)  )
 tutab
    au
nam  art deb joy lio mar nem seb tat
  da   2   3   1   1   4   0   1   0
  fr   2   2   2   1   3   1   0   1
  ya   1   2   1   1   0   1   0   0

-- David.
 
 tutunew-data.frame(nam=tutu$nam,au=with(tutu,c(as.character(au1),as.character(au2),as.character(au3
 with(tutunew,table(nam,au))
 #    au
 #nam  art deb joy lio mar nem seb tat
  # da   2   3   1   1   4   0   1   0
   #fr   2   2   2   1   3   1   0   1
   #ya   1   2   1   1   0   1   0   0
 A.K.
 
 
 
 
 
 - Original Message -
 From: David Winsemius dwinsem...@comcast.net
 To: Biau David djmb...@yahoo.fr
 Cc: r help list r-help@r-project.org
 Sent: Friday, January 11, 2013 12:20 PM
 Subject: Re: [R] count combined occurrences of categories
 
 
 On Jan 11, 2013, at 2:54 AM, Biau David wrote:
 
 Dear all,
 
 i would like to count the number of times where I have combined occurrences 
 of the categories of 2 variables.
 
 For instance, in the dataframe below, i would like to know how many times 
 each author (au1, au2, au3 represent the first, second, third author) is 
 associated with each of the category of the variable 'nam'. The position of 
 the author does not matter.
 
 nam - c('da', 'ya', 'da', 'da', 'fr', 'fr', 'fr', 'da', 'ya', 'fr')
 au1 - c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 
 'joy')
 au2 - c('art', 'deb', 'mar', 'deb', 'joy', 'mar', 'art', 'lio', 'nem', 
 'mar')
 au3 - c('mar', 'lio', 'joy', 'mar', 'art', 'lio', 'nem', 'art', 'deb', 
 'tat')
 tutu - data.frame(cbind(nam, au1, au2, au3))
 
 You should first abandon the practice of using `cbind` inside `data.frame`. 
 Obscure errors will plague your R experience until you do so.
 
 Bas solution:
 
 tutus - data.frame(nam=tutu$nam, au=with(tutu, c(au1,au2,au3)))
 tutab - with(tutus, table(nam, au)  )
 tutab
     au
 nam  1 2 3 4 5 6 7
   da 2 3 1 2 4 0 0
   fr 2 2 2 2 2 1 1
   ya 1 2 1 1 0 1 0
 
 --
 David Winsemius, MD
 Alameda, CA, USA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

David Winsemius, MD
Alameda, CA, USA




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting character values

2013-01-13 Thread Biau David
OK,

here is a minimal working example:

au1 - c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
au2 - c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
au3 - c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 'marmor 
s', 'bhumbra r', 'pansuriya tc', NA)

netw - data.frame(au1, au2, au3)
res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:dim(netw)[2])
{
wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] - substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}

 problem is for author van den hoofs j who is only retrieved as 'van'

thanks,


David Biau



 De : arun smartpink...@yahoo.com
À : Biau David djmb...@yahoo.fr 
Envoyé le : Dimanche 13 janvier 2013 17h38
Objet : Re: [R] extracting character values
 
HI,


 res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : 
 # object 'netw' not found
Can you provide an example dataset of netw?
Thanks.
A.K.



- Original Message -
From: Biau David djmb...@yahoo.fr
To: r help list r-help@r-project.org
Cc: 
Sent: Sunday, January 13, 2013 3:53 AM
Subject: [R] extracting character values

Dear all,

I have a dataframe of names (netw), with each cell including last name and 
initials of an author; some cells have NA. I would like to extract only the 
last name from each cell; this new dataframe is calle 'res'


Here is what I do:

res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:x)
{
wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] - substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}

 
the problem is that I cannot manage to extract 'complex' names properly such 
as ' van der hoops bf  ': here I only get 'van', the real last name is 'van 
der hoops' and 'bf' are the initials. Basically the last name has always a 
minimum of 3 consecutive letters, but may have 3 or more letters separated by 
one or more space; the cell may start by a space too; initials never have more 
than 2 letters.

Someone would have a nice idea for that? Thanks,


David

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting character values

2013-01-13 Thread Biau David
works great thanks. And you cut off my code a lot and removed the loop. 


 
David Biau



 De : Uwe Ligges lig...@statistik.tu-dortmund.de
À : Biau David djmb...@yahoo.fr 
Cc : arun smartpink...@yahoo.com; r help list r-help@r-project.org 
Envoyé le : Dimanche 13 janvier 2013 18h22
Objet : Re: [R] extracting character values
 


On 13.01.2013 18:02, Biau David wrote:
 OK,

 here is a minimal working example:

 au1 - c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
 'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
 au2 - c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
 pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
 au3 - c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 
 'marmor s', 'bhumbra r', 'pansuriya tc', NA)

 netw - data.frame(au1, au2, au3)
 res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

 for (i in 1:dim(netw)[2])
 {
 wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
 res[i] - substring(as.character(netw[,i]), wh, wh + 
 attr(wh,'match.length')-1)
 }


There may be an easier solution, but this should do:

res - data.frame(lapply(netw,
      function(x)
        gsub(^ *([[:alpha:] ]*) +[[:alpha:]]+$, \\1, x)))

Uwe Ligges




   problem is for author van den hoofs j who is only retrieved as 'van'

 thanks,


 David Biau


 
 De : arun smartpink...@yahoo.com
 À : Biau David djmb...@yahoo.fr
 Envoyé le : Dimanche 13 janvier 2013 17h38
 Objet : Re: [R] extracting character values

 HI,


   res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
 #Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) :
   # object 'netw' not found
 Can you provide an example dataset of netw?
 Thanks.
 A.K.



 - Original Message -
 From: Biau David djmb...@yahoo.fr
 To: r help list r-help@r-project.org
 Cc:
 Sent: Sunday, January 13, 2013 3:53 AM
 Subject: [R] extracting character values

 Dear all,

 I have a dataframe of names (netw), with each cell including last name and 
 initials of an author; some cells have NA. I would like to extract only the 
 last name from each cell; this new dataframe is calle 'res'


 Here is what I do:

 res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

 for (i in 1:x)
 {
 wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
 res[i] - substring(as.character(netw[,i]), wh, wh + 
 attr(wh,'match.length')-1)
 }


 the problem is that I cannot manage to extract 'complex' names properly 
 such as ' van der hoops bf  ': here I only get 'van', the real last name is 
 'van der hoops' and 'bf' are the initials. Basically the last name has 
 always a minimum of 3 consecutive letters, but may have 3 or more letters 
 separated by one or more space; the cell may start by a space too; initials 
 never have more than 2 letters.

 Someone would have a nice idea for that? Thanks,


 David

      [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




     [[alternative HTML version deleted]]



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting character values

2013-01-13 Thread Biau David
thanks too. It works also perfect. Not sure I understand all the code though: 
will have to look into it!


 
David Biau



 De : arun smartpink...@yahoo.com
À : Biau David djmb...@yahoo.fr 
Cc : R help r-help@r-project.org; Uwe Ligges 
lig...@statistik.tu-dortmund.de 
Envoyé le : Dimanche 13 janvier 2013 18h36
Objet : Re: [R] extracting character values
 
Hi,
This should also work:
do.call(data.frame,lapply(netw,function(x) gsub(^ *(\\D+) \\w+$,\\1,x)))
A.K.






From: Biau David djmb...@yahoo.fr
To: arun smartpink...@yahoo.com; r help list r-help@r-project.org 
Sent: Sunday, January 13, 2013 12:02 PM
Subject: Re: [R] extracting character values


OK,

here is a minimal working example:

au1 - c('biau dj', 'jones kb', 'van den hoofs j', ' biau dj', 'biau dj', 
'campagna r', 'biau dj', 'weiss kr', 'verdegaal sh', 'riad s')
au2 - c('weiss kr', 'ferguson pc', ' greidanus nv', ' porcher r', 'ferguson 
pc', 'pessis e', 'leclerc p', 'biau dj', 'bovee jv', 'biau d')
au3 - c('bhumbra rs', 'lam b', 'garbuz ds', NA, 'chung p', ' biau dj', 
'marmor s', 'bhumbra r', 'pansuriya tc', NA)

netw - data.frame(au1, au2, au3)
res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:dim(netw)[2])
{
wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] - substring(as.character(netw[,i]), wh, wh + attr(wh,'match.length')-1)
}

 problem is for author van den hoofs j who is only retrieved as 'van'

thanks,


David Biau



 De : arun smartpink...@yahoo.com
À : Biau David djmb...@yahoo.fr 
Envoyé le : Dimanche 13 janvier 2013 17h38
Objet : Re: [R] extracting character values
 
HI,


 res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))
#Error in matrix(NA, nrow = dim(netw)[1], ncol = dim(netw)[2]) : 
 # object 'netw' not found
Can you provide an example dataset of netw?
Thanks.
A.K.



- Original Message -
From: Biau David djmb...@yahoo.fr
To: r help list r-help@r-project.org
Cc: 
Sent: Sunday, January 13, 2013 3:53 AM
Subject: [R] extracting character values

Dear all,

I have a dataframe of names (netw), with each cell including last name and 
initials of an author; some cells have NA. I would like to extract only the 
last name from each cell; this new dataframe is calle 'res'


Here is what I do:

res - data.frame(matrix(NA, nrow=dim(netw)[1], ncol=dim(netw)[2]))

for (i in 1:x)
{
wh - regexpr('[a-z]{3,}', as.character(netw[,i]))
res[i] - substring(as.character(netw[,i]), wh, wh + 
attr(wh,'match.length')-1)
}

 
the problem is that I cannot manage to extract 'complex' names properly such 
as ' van der hoops bf  ': here I only get 'van', the real last name is
'van der hoops' and 'bf' are the initials. Basically the last name has always 
a minimum of 3 consecutive letters, but may have 3 or more letters separated 
by one or more space; the cell may start by a space too; initials never have 
more than 2 letters.

Someone would have a nice idea for that? Thanks,


David

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count combined occurrences of categories

2013-01-11 Thread Biau David
Dear all,
 
i would like to count the number of times where I have combined occurrences of 
the categories of 2 variables.
 
For instance, in the dataframe below, i would like to know how many times each 
author (au1, au2, au3 represent the first, second, third author) is associated 
with each of the category of the variable 'nam'. The position of the author 
does not matter.
 
nam - c('da', 'ya', 'da', 'da', 'fr', 'fr', 'fr', 'da', 'ya', 'fr')
au1 - c('deb', 'art', 'deb', 'seb', 'deb', 'deb', 'mar', 'mar', 'joy', 'joy')
au2 - c('art', 'deb', 'mar', 'deb', 'joy', 'mar', 'art', 'lio', 'nem', 'mar')
au3 - c('mar', 'lio', 'joy', 'mar', 'art', 'lio', 'nem', 'art', 'deb', 'tat')
tutu - data.frame(cbind(nam, au1, au2, au3))
 
thanks,

David
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function

2010-11-15 Thread Biau David
Dear Prof Therneau,

thank yo for this information: this is going to be most useful for what I want 
to do. I will look into the ACF model.

Yours,

 David Biau.





De : Terry Therneau thern...@mayo.edu

Cc : r-help@r-project.org
Envoyé le : Lun 15 novembre 2010, 15h 33min 23s
Objet : Re: interpretation of coefficients in survreg AND obtaining the hazard 
function

1. The weibull is the only distribution that can be written in both a
proportional hazazrds for and an accelerated failure time form.  Survreg
uses the latter.
   In an ACF model, we model the time to failure.  Positive coefficients
are good (longer time to death).
   In a PH model, we model the death rate.  Positive coefficients are
bad (higher death rate).

You are not the first to be confused by the change in sign between the
two models.

2. There are about 5 different ways to parameterize a Weibull
distribution, 1-4 appear in various texts and the acf form is #5.  This
is a second common issue with survreg that strikes only the more
sophisticated users: to understand the output they look up the Weibull
in a textbook, and become even more confused!  

Kalbfliesch and Prentice is a good reference for the acf form.  The
manual page for psurvreg has some information on this, as does the very
end of ?survreg.  The psurvreg page also has an example of how to
extract the hazard function for a Weibull fit.

Begin included message 

Dear R help list,

I am modeling some survival data with coxph and survreg (dist='weibull')
using 
package survival. I have 2 problems:

1) I do not understand how to interpret the regression coefficients in
the 
survreg output and it is not clear, for me, from ?survreg.objects how
to.

Here is an example of the codes that points out my problem:
- data is stc1
- the factor is dichotomous with 'low' and 'high' categories

slr - Surv(stc1$ti_lr, stc1$ev_lr==1)

mca - coxph(slr~as.factor(grade2=='high'), data=stc1)
mcb - coxph(slr~as.factor(grade2), data=stc1)
mwa - survreg(slr~as.factor(grade2=='high'), data=stc1,
dist='weibull', 
scale=0)
mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull',
scale=0)

 summary(mca)$coef

coef
exp(coef)  se(coef) z  Pr(|z|)
as.factor(grade2 == high)TRUE 0.2416562  1.273356 0.2456232
0.9838494  0.3251896

 summary(mcb)$coef
   coef exp(coef)  
se(coef) z Pr(|z|)
as.factor(grade2)low -0.2416562 0.7853261 0.2456232
-0.9838494
0.3251896

 summary(mwa)$coef
(Intercept) as.factor(grade2 == high)TRUE 
7.9068380   -0.4035245 

 summary(mwb)$coef
(Intercept) as.factor(grade2)low 
7.5033135   0.4035245 


No problem with the interpretation of the coefs in the cox model.
However, i do 
not understand why
a) the coefficients in the survreg model are the opposite (negative when
the 
other is positive) of what I have in the cox model? are these not the
log(HR) 
given the categories of these variable?
b) how come the intercept coefficient changes (the scale parameter does
not 
change)?

2) My second question relates to the first.
a) given a model from survreg, say mwa above, how should i do to extract
the 
base hazard and the hazard of each patient given a set of predictors?
With the 
hazard function for the ith individual in the study given by  h_i(t) = 
exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me
that 
predict(mwa, type='linear') is \beta'x_i.
b) since I need the coefficient intercept from the model to obtain the
scale 
parameter  to obtain the base hazard function as defined in Collett 
(h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this
coefficient 
intercept changes depending on the reference level of the factor entered
in the 
model. The change is very important when I have more than one predictor
in the 
model.

Any help would be greatly appreciated,

David Biau.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function for an individual given a set of predictors

2010-11-14 Thread Biau David
Dear Prof Lumley,

This is a very clear, precise, and useful answer to all my questions.

Thank you very much.

 David Biau.





De : Thomas Lumley tlum...@uw.edu

Cc : r help list r-help@r-project.org
Envoyé le : Dim 14 novembre 2010, 23h 54min 23s
Objet : Re: [R] interpretation of coefficients in survreg AND obtaining the
hazard function for an individual given a set of predictors


 Dear R help list,

 I am modeling some survival data with coxph and survreg (dist='weibull') using
 package survival. I have 2 problems:

 1) I do not understand how to interpret the regression coefficients in the
 survreg output and it is not clear, for me, from ?survreg.objects how to.

 Here is an example of the codes that points out my problem:
 - data is stc1
 - the factor is dichotomous with 'low' and 'high' categories

 slr - Surv(stc1$ti_lr, stc1$ev_lr==1)

 mca - coxph(slr~as.factor(grade2=='high'), data=stc1)
 mcb - coxph(slr~as.factor(grade2), data=stc1)
 mwa - survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull',
 scale=0)
 mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0)

 summary(mca)$coef
 coef
 exp(coef)  se(coef) z  Pr(|z|)
 as.factor(grade2 == high)TRUE 0.2416562  1.273356 0.2456232
 0.9838494  0.3251896

 summary(mcb)$coef
   coef exp(coef)
 se(coef) z Pr(|z|)
 as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494
 0.3251896

 summary(mwa)$coef
 (Intercept) as.factor(grade2 == high)TRUE
 7.9068380   -0.4035245

 summary(mwb)$coef
 (Intercept) as.factor(grade2)low
 7.5033135   0.4035245


 No problem with the interpretation of the coefs in the cox model. However, i 
do
 not understand why
 a) the coefficients in the survreg model are the opposite (negative when the
 other is positive) of what I have in the cox model? are these not the log(HR)
 given the categories of these variable?

No. survreg() fits accelerated failure models, not proportional
hazards models.   The coefficients are logarithms of ratios of
survival times, so a positive coefficient means longer survival.


 b) how come the intercept coefficient changes (the scale parameter does not
 change)?

Because you have reversed the order of the factor levels.  The
coefficient of that variable changes sign and the intercept changes to
compensate.


 2) My second question relates to the first.
 a) given a model from survreg, say mwa above, how should i do to extract the
 base hazard and the hazard of each patient given a set of predictors? With the
 hazard function for the ith individual in the study given by  h_i(t) =
 exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that
 predict(mwa, type='linear') is \beta'x_i.

No, it's beta'x_i for the accelerated failure parametrization of the
Weibull.  In terms of the CDF

F_i(t) = F_0( exp((t+beta'x_i)/scale) )

So you need to multiply by the scale parameter and change sign to get
the log hazard ratios.


 b) since I need the coefficient intercept from the model to obtain the scale
 parameter  to obtain the base hazard function as defined in Collett
 (h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient
 intercept changes depending on the reference level of the factor entered in 
the
 model. The change is very important when I have more than one predictor in the
 model.

As Terry Therneau pointed out recently in the context of the Cox
model, there is no such thing as the baseline hazard.  The baseline
hazard is the hazard when all your covariates are equal to zero, and
this depends on how you parametrize.  In mwa, zero is grade2=low, in
mwb, zero is grade2=high, so the hazard at zero has to be different
in the two cases.

 -thomas

--
Thomas Lumley
Professor of Biostatistics
University of Auckland



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interpretation of coefficients in survreg AND obtaining the hazard function for an individual given a set of predictors

2010-11-13 Thread Biau David
Dear R help list,

I am modeling some survival data with coxph and survreg (dist='weibull') using 
package survival. I have 2 problems:

1) I do not understand how to interpret the regression coefficients in the 
survreg output and it is not clear, for me, from ?survreg.objects how to.

Here is an example of the codes that points out my problem:
- data is stc1
- the factor is dichotomous with 'low' and 'high' categories

slr - Surv(stc1$ti_lr, stc1$ev_lr==1)

mca - coxph(slr~as.factor(grade2=='high'), data=stc1)
mcb - coxph(slr~as.factor(grade2), data=stc1)
mwa - survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull', 
scale=0)
mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0)

 summary(mca)$coef
 coef 
exp(coef)  se(coef) z  Pr(|z|)
as.factor(grade2 == high)TRUE 0.2416562  1.273356 0.2456232 
0.9838494  0.3251896

 summary(mcb)$coef
   coef exp(coef)  
se(coef) z Pr(|z|)
as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494 
0.3251896

 summary(mwa)$coef
(Intercept) as.factor(grade2 == high)TRUE 
7.9068380   -0.4035245 

 summary(mwb)$coef
(Intercept) as.factor(grade2)low 
7.5033135   0.4035245 


No problem with the interpretation of the coefs in the cox model. However, i do 
not understand why
a) the coefficients in the survreg model are the opposite (negative when the 
other is positive) of what I have in the cox model? are these not the log(HR) 
given the categories of these variable?
b) how come the intercept coefficient changes (the scale parameter does not 
change)?

2) My second question relates to the first.
a) given a model from survreg, say mwa above, how should i do to extract the 
base hazard and the hazard of each patient given a set of predictors? With the 
hazard function for the ith individual in the study given by  h_i(t) = 
exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that 
predict(mwa, type='linear') is \beta'x_i.
b) since I need the coefficient intercept from the model to obtain the scale 
parameter  to obtain the base hazard function as defined in Collett 
(h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient 
intercept changes depending on the reference level of the factor entered in the 
model. The change is very important when I have more than one predictor in the 
model.

Any help would be greatly appreciated,

David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function for an individual given a set of predictors

2010-11-13 Thread Biau David
Thank you David for your answer,

- grade2 is a factor with 2 categories: high and low 
- yes as.factor is superfluous; it is just that it avoids warnings sometimes. 
This can be overlooked.
- I will look into Terry Therneau answers; he gives a good explanation on how 
to 
obtain the hazard for an individual given a set of predictors for the Cox 
model; 
I will look to see if this works for survreg andlook into survreg.distributions 
if it doesn't
- I'll come back if I can't figure it out.

Thanks again.

Best,

 David Biau.





De : David Winsemius dwinsem...@comcast.net

Cc : r help list r-help@r-project.org
Envoyé le : Sam 13 novembre 2010, 19h 55min 10s
Objet : Re: [R] interpretation of coefficients in survreg AND obtaining the
hazard function for an individual given a set of predictors


On Nov 13, 2010, at 12:51 PM, Biau David wrote:

 Dear R help list,
 
 I am modeling some survival data with coxph and survreg (dist='weibull') using
 package survival. I have 2 problems:
 
 1) I do not understand how to interpret the regression coefficients in the
 survreg output and it is not clear, for me, from ?survreg.objects how to.

Have you read:

?survreg.distributions  # linked from survreg help

 
 Here is an example of the codes that points out my problem:
 - data is stc1
 - the factor is dichotomous with 'low' and 'high' categories

Not an unambiguous description for the purposes of answering your many 
questions. Please provide data or at the very least: str(stc1)

 
 slr - Surv(stc1$ti_lr, stc1$ev_lr==1)
 
 mca - coxph(slr~as.factor(grade2=='high'), data=stc1)

Not sure what that would be returning since we do not know the encoding of
grade2. If you want an estimate on a subset wouldn't you do the subsetting
outside of the formula? (You may be reversing the order by offering a logical 
test for grade2.)

 mcb - coxph(slr~as.factor(grade2), data=stc1)

You have not provided the data or str(stc1), so it is entirely possible that 
as.factor is superfluous in this call.


 mwa - survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull',
 scale=0)
 mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0)
 
 summary(mca)$coef
 coef
 exp(coef)  se(coef) z  Pr(|z|)
 as.factor(grade2 == high)TRUE 0.2416562  1.273356 0.2456232
 0.9838494  0.3251896
 
 summary(mcb)$coef
   coef exp(coef)
 se(coef) z Pr(|z|)
 as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494
 0.3251896
 
 summary(mwa)$coef
 (Intercept) as.factor(grade2 == high)TRUE
 7.9068380   -0.4035245
 
 summary(mwb)$coef
 (Intercept) as.factor(grade2)low
 7.5033135   0.4035245
 
 
 No problem with the interpretation of the coefs in the cox model. However, i 
do
 not understand why
 a) the coefficients in the survreg model are the opposite (negative when the
 other is positive) of what I have in the cox model? are these not the log(HR)
 given the categories of these variable?

Probably because the order of the factor got reversed when you changed the
covariate to logical and them back to factor.

 b) how come the intercept coefficient changes (the scale parameter does not
 change)?
 
 2) My second question relates to the first.
 a) given a model from survreg, say mwa above, how should i do to extract the
 base hazard

Answered by Therneau earlier this week and the next question last month:

https://stat.ethz.ch/pipermail/r-help/2010-November/259570.html

https://stat.ethz.ch/pipermail/r-help/2010-October/257941.html


 and the hazard of each patient given a set of predictors? With the
 hazard function for the ith individual in the study given by  h_i(t) =
 exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that
 predict(mwa, type='linear') is \beta'x_i.


 b) since I need the coefficient intercept from the model to obtain the scale
 parameter  to obtain the base hazard function as defined in Collett
 (h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient
 intercept changes depending on the reference level of the factor entered in 
the
 model. The change is very important when I have more than one predictor in the
 model.
 
 Any help would be greatly appreciated,
 
 David Biau.
 


David Winsemius, MD
West Hartford, CT


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to compare the effect of a variable across regression models?

2010-08-13 Thread Biau David
Hello,

I would like, if it is possible, to compare the effect of a variable across 
regression models. I have looked around but I haven't found anything. Maybe 
someone could help? Here is the problem:

I am studying the effect of a variable (age) on an outcome (local recurrence: 
lr). I have built 3 models:
- model 1: lr ~ age  y = \beta_(a1).age
- model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age + 
\BETA_(p2).X_p
- model 3: lr ~ age + presentation variables + treatment variables( X_t) 
   y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t
 
Presentation variables include variables such as tumor grade, tumor size, 
etc... 
the physician cannot interfer with these variables.
Treatment variables include variables such as chemotherapy, radiation, surgical 
margins (a surrogate for adequate surgery).

I have used cph for the models and restricted cubic splines (Design library) 
for 
age. I have noted that the effect of age decreases from model 1 to 3.

I would like to compare the effect of age on the outcome across the different 
models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two 
comparisons or a global trend test maybe? Is that possible?

Thank you for your help,


David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : How to compare the effect of a variable across regression models?

2010-08-13 Thread Biau David
OK,

thank you very much for the answer.I will look into that. Hopefully I'll find 
smoething that will work out.

Best,

 David Biau.





De : Frank Harrell f.harr...@vanderbilt.edu

Cc : r help list r-help@r-project.org
Envoyé le : Ven 13 août 2010, 15h 50min 18s
Objet : Re: [R] How to compare the effect of a variable across regression 
models?


David,

In the Cox and many other regression models, the effect of a variable is 
context-dependent.  There is an identifiability problem in what you are doing, 
as discussed by

@ARTICLE{for95mod,
  author = {Ford, Ian and Norrie, John and Ahmadi, Susan},
  year = 1995,
  title = {Model inconsistency, illustrated by the {Cox} proportional hazards
  model},
  journal = Stat in Med,
  volume = 14,
  pages = {735-746},
  annote = {covariable adjustment; adjusted estimates; baseline imbalances;
   RCT; model misspecification; model identification}
}

One possible remedy, which may not work for your goals, is to embed all models 
in a grand model that is used for inference.

When coefficients ARE comparable in some sense, you can use the bootstrap to 
get 
confidence bands for differences in regressor effects between models.

Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 13 Aug 2010, Biau David wrote:

 Hello,
 
 I would like, if it is possible, to compare the effect of a variable across
 regression models. I have looked around but I haven't found anything. Maybe
 someone could help? Here is the problem:
 
 I am studying the effect of a variable (age) on an outcome (local recurrence:
 lr). I have built 3 models:
 - model 1: lr ~ age  y = \beta_(a1).age
 - model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age 
+
 \BETA_(p2).X_p
 - model 3: lr ~ age + presentation variables + treatment variables( X_t)
   y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t
 
 Presentation variables include variables such as tumor grade, tumor size,
etc...
 the physician cannot interfer with these variables.
 Treatment variables include variables such as chemotherapy, radiation, 
surgical
 margins (a surrogate for adequate surgery).
 
 I have used cph for the models and restricted cubic splines (Design library) 
for
 age. I have noted that the effect of age decreases from model 1 to 3.
 
 I would like to compare the effect of age on the outcome across the different
 models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two
 comparisons or a global trend test maybe? Is that possible?
 
 Thank you for your help,
 
 
 David Biau.
 
 
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Re : How to compare the effect of a variable across regression models?

2010-08-13 Thread Biau David
} proportional hazards
  model},
  journal = Stat in Med,
  volume = 14,
  pages = {735-746},
  annote = {covariable adjustment; adjusted estimates; baseline imbalances;
   RCT; model misspecification; model identification}
 }

 One possible remedy, which may not work for your goals, is to embed all models
 in a grand model that is used for inference.

 When coefficients ARE comparable in some sense, you can use the bootstrap to 
get
 confidence bands for differences in regressor effects between models.

 Frank

 Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

 On Fri, 13 Aug 2010, Biau David wrote:

 Hello,

 I would like, if it is possible, to compare the effect of a variable across
 regression models. I have looked around but I haven't found anything. Maybe
 someone could help? Here is the problem:

 I am studying the effect of a variable (age) on an outcome (local recurrence:
 lr). I have built 3 models:
 - model 1: lr ~ age  y = \beta_(a1).age
 - model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age
 +
 \BETA_(p2).X_p
 - model 3: lr ~ age + presentation variables + treatment variables( X_t)
   y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t

 Presentation variables include variables such as tumor grade, tumor size,
etc...
 the physician cannot interfer with these variables.
 Treatment variables include variables such as chemotherapy, radiation,
 surgical
 margins (a surrogate for adequate surgery).

 I have used cph for the models and restricted cubic splines (Design library)
for
 age. I have noted that the effect of age decreases from model 1 to 3.

 I would like to compare the effect of age on the outcome across the different
 models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two
 comparisons or a global trend test maybe? Is that possible?

 Thank you for your help,


 David Biau.




 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to extract se(coef) from cph?

2010-08-05 Thread Biau David
Hello,

I am modeling some survival data wih cph (Design). I have modeled a predictor 
which showed non linear effect with restricted cubic splines. I would like to 
retrieve the se(coef) for other, linear, predictors. This is just to make nice 
LateX tables automatically. I have the coefficients with coef().

How do I do that?

Thanks,

 David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : How to extract se(coef) from cph?

2010-08-05 Thread Biau David
Excellent!

Yes, FH has a function to get LateX tables, but I not malleable enough.

Thanks,

 David Biau.





De : David Winsemius dwinsem...@comcast.net

Cc : r help list r-help@r-project.org
Envoyé le : Jeu 5 août 2010, 22h 11min 20s
Objet : Re: [R] How to extract se(coef) from cph?


On Aug 5, 2010, at 4:03 PM, Biau David wrote:

 Hello,
 
 I am modeling some survival data wih cph (Design). I have modeled a predictor
 which showed non linear effect with restricted cubic splines. I would like to
 retrieve the se(coef) for other, linear, predictors.

The cph object has a var. The vcov function is an extractor function. You
would probably be using something like:

diag(vcov(fit))^(1/2)

 This is just to make nice
 LateX tables automatically.

Are you sure Frank has not already programed that for you somewhere? Perhaps 
latex.cph?

 I have the coefficients with coef().
 
 How do I do that?
 
 Thanks,
 
 David Biau.
 

--
David Winsemius, MD
West Hartford, CT


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : How to extract se(coef) from cph?

2010-08-05 Thread Biau David
thanks, it works just great.

 David Biau.





De : Abhijit Dasgupta, PhD aikidasgu...@gmail.com

Cc : r help list r-help@r-project.org
Envoyé le : Jeu 5 août 2010, 22h 15min 37s
Objet : Re: [R] How to extract se(coef) from cph?

if the cph model fit is m1, you can try

sqrt(diag(m1$var))

This is coded in print.cph.fit (library(rms))

On 08/05/2010 04:03 PM, Biau David wrote:
 Hello,

 I am modeling some survival data wih cph (Design). I have modeled a predictor
 which showed non linear effect with restricted cubic splines. I would like to
 retrieve the se(coef) for other, linear, predictors. This is just to make nice
 LateX tables automatically. I have the coefficients with coef().

 How do I do that?

 Thanks,

   David Biau.




 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Abhijit Dasgupta, PhD
Director and Principal Statistician
ARAASTAT
Ph: 301.385.3067
E: adasgu...@araastat.com
W: http://www.araastat.com


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] COXPH: how to get the score test and likelihood ratio test for a specific variable in a multivariate Coxph ?

2010-07-30 Thread Biau David
Hello,

I would like to get the likelihood ratio and score tests for specific variables 
in a multivariate coxph model. The default is Wald, so the tests for each 
separate variable is based on Wald's test. I have the other tests for the full 
model but I don't know how to get them for each variable.

Any idea?

 David Biau.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : COXPH: how to get the score test and likelihood ratio test for a specific variable in a multivariate Coxph ?

2010-07-30 Thread Biau David
Thx for the answer.

I am using survival.

I didn't know that the Wald and score tests were the same for individual 
variables in a coxph; I Thought the score test was the multivariate version 
of 
the Log-rank.

However, say I have only one variable in the model, I should expect the test 
for 
the full model and the one for a single variable to be the same? Then it seems 
to me that the default test is the Wald and that the Wald and the Score are 
different.

 cox_lr_age - coxph(Surv(tilr, ev_lr==1)~age, data=tam)
 summary(cox_lr_age)
Call:
coxph(formula = Surv(tilr, ev_lr == 1) ~ age, data = tam)

  n=2156 (76 observations deleted due to missingness)

coef exp(coef) se(coef) z Pr(|z|)
age 0.019504  1.019696 0.004651 4.193 2.75e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ 
’ 1 

exp(coef) exp(-coef) lower .95 upper .95
age 1.020 0.9807 1.010 1.029

Rsquare= 0.008   (max possible= 0.669 )
Likelihood ratio test= 18.4  on 1 df,   p=1.787e-05
Wald test= 17.58  on 1 df,   p=2.751e-05
Score (logrank) test = 17.86  on 1 df,   p=2.375e-05


 David Biau.





De : David Winsemius dwinsem...@comcast.net

Cc : r help list r-help@r-project.org
Envoyé le : Ven 30 juillet 2010, 17h 34min 28s
Objet : Re: [R] COXPH: how to get the score test and likelihood ratio test for 
a 
specific variable in a multivariate Coxph ?


On Jul 30, 2010, at 11:08 AM, Biau David wrote:

 Hello,
 
 I would like to get the likelihood ratio and score tests for specific 
variables
 in a multivariate coxph model. The default is Wald, so the tests for each
 separate variable is based on Wald's test. I have the other tests for the full
 model but I don't know how to get them for each variable.
 
 Any idea?
 

The first idea would be to specify which function in which package you are
asking questions about. In the case of coxph in the survival package, for 
instance, you do get a likelihood ratio test (== differences in 
log-likelihoods) 
by default. A score test is, at least as as I understand it for individual
variables, equivalent to a Wald test, so I don't really understand your 
question, since youa re already getting all of that in the survival package.

(You can extract a score value and loglik values from a coxph object by:
(with the first example in the coxph help page)

coxph(Surv(time, status) ~ x + strata(sex), test1)$score
xoxph(Surv(time, status) ~ x + strata(sex), test1)$loglik

But anova(coxph-object) would give you these values in a neater bundle.
#Analysis of Deviance Table
# Cox model: response is Surv(time, status)
#Terms added sequentially (first to last)
#  loglik  Chisq Df Pr(|Chi|)
# NULL -3.8712
# x-3.3277 1.0871  1 0.2971

The question about getting them for each variable does not make a lot of 
sense 
to me, since likelihood tests are model comparisons. You can only make such
statements about the consequences of adding or deleting a variable to/from an 
existing model.

--David Winsemius, MD
West Hartford, CT


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re : Re : COXPH: how to get the score test and likelihood ratio test for a specific variable in a multivariate Coxph ?

2010-07-30 Thread Biau David
Well thank you very much for these explanations. Unfortunately, I must admit 
the 
book I have for survival analysis seems less precise as to which test to use 
and 
why.

Still, in coxph (survival), if I have multiple variables in a model, say  X_1, 
X_2, and X_3, how do I test their respective coefficients \beta_1, \beta_2, and 
\beta_3 with the LR, score and Wald? I guess i can do it by comparing the model 
with all three variables to those without each of the variables, but is there 
not a more straightforward manner?

 David Biau.





De : Therneau, Terry M., Ph.D. thern...@mayo.edu
À : David Winsemius dwinsem...@comcast.net; Biau David djmb...@yahoo.fr
Cc : r help list r-help@r-project.org
Envoyé le : Ven 30 juillet 2010, 19h 07min 15s
Objet : RE: Re : [R] COXPH: how to get the score test and likelihood ratio test 
for a specific variable in a multivariate Coxph ?

The Wald, score, and LR tests are discussed in full in my book.  They
are not the same.
The LR test is the difference between LR(beta=0) and LR(beta=final). The
score test is a Taylor series approximation to this using an expansion
around beta=0.  The Wald test is a similar Taylor series approximation,
but around beta=final.  
  If there are no tied times the score test = Log-rank test.  If there
are ties, then they are just a tiny bit different: the paper using the
log-rank has an n-1 in his variance term and the Cox model has an n.
Neither is right or wrong, just a different choice.

Terry Therneau



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] longitudinal tobit regression in R

2010-06-30 Thread Biau David
Hi,

I am trying to model a score over time. This score shows a ceiling effect. I 
was 
willing to use a longitudinal tobit model, such as the one described by Twisk 
et 
al. (Twisk_Longitudinal tobit regression: A new approach to analyze outcome 
variables with floor or ceiling effects_JCE_2009) but it is programmed for 
STATA.

Has anyone used such models in R?
Any other idea?

David Biau.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] crosstabling multiple variables at once

2010-05-20 Thread Biau David


Hi,

I am trying to describe a data.frame by obtaining multiple crosstable summary 
statistics at once. I have tried table, xtab, crosstable, summaryBy and 
describe but none of these functions seems to allow muliple conparisons at 
once.
 Here, is what I would like to do:

I have, for instance, age, sex (M and F), grade (1, 2, 3) and site (limb, 
trunk) and I want the, for instance, following summary statistics:
- age (mean, SD) for males and age for females
- age for grade 1, grade2, and grade 3
- age for site limb, site trunk
- sex (count, proportions) for grade 1, grade2, and grade 3
- sex (count, proportions) site limb, site trunk (already have sex/age above)
- grade (count, proportions) for site limb, site trunk (already have grade/sex 
and grade/age above)
a
lso, I want each of these not crossed by any others (mean overall age, numbers 
of males, etc) which could be seen as each crossed with its own.
 
I have at least 10 variables, continuous, categorical ordered and non ordered. 
I don't want any tests.

Any idea?

David Biau. 



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple 2 by 2 crosstabulations?

2010-05-19 Thread Biau David
Hello,

I have a dataframe (var_1, var_2, ..., var_n) and I would like to export 
summary statistics to Latex in the form of a table. I want specific summary 
statistics by crossing numerous variables 2x2 AT ONCE. In each cell I would 
like sometimes to have the median (Q1 - Q3), or frequency and proportion, etc. 
CrossTable, xtab, etc... do not allow for multiple 2 by 2 crosstabulation. The 
table would look like this:

     var_1  var_2 var3, ...
var_1   a        b            c 
var_2   d        e        f
var_3   .. ...     ...

with a, b, c, ... the results of each crosstabulation. I have continuous and 
categorical variables.

Any idea?

Thank you very much,

David.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.