[R] independent censoring

2016-11-28 Thread Damjan Krstajic
Dear All,


Independent censoring is one of the fundamental assumptions in the survival 
analysis. However, I cannot find any test for it or any paper which discusses 
how real that assumption is.


I would be grateful if anybody could point me to some useful references. I have 
found the following paper as an interesting reference but it is not freely 
available.


Leung, Kwan-Moon, Robert M. Elashoff, and Abdelmonem A. Afifi. "Censoring 
issues in survival analysis." Annual review of public health 18.1 (1997): 
83-104.


Any feedback would be much appreciated.


Kind regards

DK


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2016-11-28 Thread Jeff Newmiller
Doesn't look right to me... you are likely to need to change something to 
handle the multiple directories thing somehow, but I don't know why you made 
the changes you did make to my suggestion.
-- 
Sent from my phone. Please excuse my brevity.

On November 28, 2016 5:56:04 PM PST, Val  wrote:
>Hi Jeff  and John,
>
>Thank you for your response.
>In each folder, I am expecting a single file name (either dat or
>dat.csv).v  so will this work?
>
>
>Is the following correct?
>fns  <- list.files(mydir)
>if (is.element(pattern="dat(\\.[^.]+)$",fns ))
>
>Thank you again.
>
>On Mon, Nov 28, 2016 at 7:20 PM, Jeff Newmiller
> wrote:
>> No, and yes, depending what you mean.
>>
>> No, because you have to supply the file name to open it... you cannot
>directly use wildcards to open files.
>>
>> Yes,  because the list.files function can be used to match all file
>names fitting a regex pattern, and you can use those filenames to open
>the files.
>>
>> E.g.
>>
>> fns  <- list.files( pattern="dat(\\.[^.]+)$" )
>> dtaL <- lapply( fns, function(fn){ read.csv( fn,
>stringsAsFactors=FALSE ) } )
>>
>> If you only expect one file to be in any given directory, you can
>skip the lapply and just read the file, or you can extract the data
>frame from the list using dtaL[[ 1 ]].
>>
>> ?list.files
>> ?regex for help on patterns
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>> On November 28, 2016 2:23:23 PM PST, Ashta  wrote:
>>>Hi all,
>>>
>>>I have a script that  reads a file (dat.csv)  from several folders.
>>>However, in some folders the file name is (dat) with out csv  and in
>>>other folders it is dat.csv.  The format of data is the same(only the
>>>file name differs  with and without "csv".
>>>
>>>Is it possible to read these files  depending on their name in one?
>>>like read.csv("dat.csv"). How can I read both type of file names?
>>>
>>>Thank you in advance
>>>
>>>__
>>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide
>>>http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2016-11-28 Thread Val
Hi Jeff  and John,

Thank you for your response.
In each folder, I am expecting a single file name (either dat or
dat.csv).v  so will this work?


Is the following correct?
fns  <- list.files(mydir)
if (is.element(pattern="dat(\\.[^.]+)$",fns ))

Thank you again.

On Mon, Nov 28, 2016 at 7:20 PM, Jeff Newmiller
 wrote:
> No, and yes, depending what you mean.
>
> No, because you have to supply the file name to open it... you cannot 
> directly use wildcards to open files.
>
> Yes,  because the list.files function can be used to match all file names 
> fitting a regex pattern, and you can use those filenames to open the files.
>
> E.g.
>
> fns  <- list.files( pattern="dat(\\.[^.]+)$" )
> dtaL <- lapply( fns, function(fn){ read.csv( fn, stringsAsFactors=FALSE ) } )
>
> If you only expect one file to be in any given directory, you can skip the 
> lapply and just read the file, or you can extract the data frame from the 
> list using dtaL[[ 1 ]].
>
> ?list.files
> ?regex for help on patterns
> --
> Sent from my phone. Please excuse my brevity.
>
> On November 28, 2016 2:23:23 PM PST, Ashta  wrote:
>>Hi all,
>>
>>I have a script that  reads a file (dat.csv)  from several folders.
>>However, in some folders the file name is (dat) with out csv  and in
>>other folders it is dat.csv.  The format of data is the same(only the
>>file name differs  with and without "csv".
>>
>>Is it possible to read these files  depending on their name in one?
>>like read.csv("dat.csv"). How can I read both type of file names?
>>
>>Thank you in advance
>>
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2016-11-28 Thread Jeff Newmiller
No, and yes, depending what you mean. 

No, because you have to supply the file name to open it... you cannot directly 
use wildcards to open files.

Yes,  because the list.files function can be used to match all file names 
fitting a regex pattern, and you can use those filenames to open the files.

E.g.

fns  <- list.files( pattern="dat(\\.[^.]+)$" )
dtaL <- lapply( fns, function(fn){ read.csv( fn, stringsAsFactors=FALSE ) } )

If you only expect one file to be in any given directory, you can skip the 
lapply and just read the file, or you can extract the data frame from the list 
using dtaL[[ 1 ]].

?list.files
?regex for help on patterns
-- 
Sent from my phone. Please excuse my brevity.

On November 28, 2016 2:23:23 PM PST, Ashta  wrote:
>Hi all,
>
>I have a script that  reads a file (dat.csv)  from several folders.
>However, in some folders the file name is (dat) with out csv  and in
>other folders it is dat.csv.  The format of data is the same(only the
>file name differs  with and without "csv".
>
>Is it possible to read these files  depending on their name in one?
>like read.csv("dat.csv"). How can I read both type of file names?
>
>Thank you in advance
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read

2016-11-28 Thread John McKown
On Mon, Nov 28, 2016 at 4:23 PM, Ashta  wrote:

> Hi all,
>
> I have a script that  reads a file (dat.csv)  from several folders.
> However, in some folders the file name is (dat) with out csv  and in
> other folders it is dat.csv.  The format of data is the same(only the
> file name differs  with and without "csv".
>
> Is it possible to read these files  depending on their name in one?
> like read.csv("dat.csv"). How can I read both type of file names?
>
> Thank you in advance
>
>
​I'd do something like this:

> files=c('dat.csv','dat')
> file2read=files[file.exists(files)][1]
> file2read
[1] "dat.csv"

You put the possible file names into the variable in the order of
preference. E.g. I prefer "dat.csv" over "dat" if by chance both exist.

> files=c('not.csv','not')
> file2read=files[file.exists(files)][1]
> file2read
[1] NA

​The above shows the result should none of the files exist. So if
"file2read" has an NA, then you go on to the next directory.​


-- 
Heisenberg may have been here.

Unicode: http://xkcd.com/1726/

Maranatha! <><
John McKown​

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] errors when installing packages

2016-11-28 Thread Chris
I'm using R 3.3.1 
when installing/ updating a library module, for example "Hmisc" I get an error 
message about "unable to move..."

cutting/pasting
survival’ successfully unpacked and MD5 sums checkedWarning: unable to move 
temporary installation 
‘C:\Users\Chris\Documents\R\win-library\3.3\file4681d2a5a2a\survival’ to 
‘C:\Users\Chris\Documents\R\win-library\3.3\survival’

 Chris Barker, Ph.D.
Adjunct Associate Professor of Biostatistics - UIC-SPH
and

skype: barkerstats


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read

2016-11-28 Thread Ashta
Hi all,

I have a script that  reads a file (dat.csv)  from several folders.
However, in some folders the file name is (dat) with out csv  and in
other folders it is dat.csv.  The format of data is the same(only the
file name differs  with and without "csv".

Is it possible to read these files  depending on their name in one?
like read.csv("dat.csv"). How can I read both type of file names?

Thank you in advance

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manipulating groups of boolean data subject to group size and distance from other groups

2016-11-28 Thread Morway, Eric
To help with the clarification, I renamed 'col1' to 'year' and 'col2' to
'origDat'.  With that said...

The reason the second 'block' of 1's (four consecutive 1's appearing in
DF$origDat[11:14]) is preserved is because they are only separated by a
total of 1 year (1998 in DF$year) from a larger group of consecutive 1's
(years 1999 through 2002).  Because the first block of 1's are separated
from from any other block of ones by at least 2 years, which I have deemed
to be too large of a gap in data (0's are a surrogate for missing data),
the 1's appearing in DF$year[3:6] should be reset to 0.

I modified the script based on David's suggestion of rle (I was previously
unaware of it) to that shown below, and it works for all three example DF's
provided at the top of the script. That is, after running the script with
any of the first 3 DF's provided, the data in DF$finalDat (as compared to
DF$origDat) is reflective of what I'm after.

HOWEVER, the use of nested while loops and if statements strikes me as
antithetical to elegant R scripting.  Second, my script, as currently
constituted, has a significant bug in that the rules I've set forth are not
completely satisfied.  If DF4 is used (uncomment the line: "DF <- DF4") the
blocks of 1's at the beginning and end of DF$origDat are preserved, whereas
the middle (and largest continuous) block of 1's appearing in the middle of
DF$origDat are reset to 0.  Thus, I think I'm in need of a more elegant way
of pursuing this problem...should anyone be so inclined to offer of
additional thoughts.

The (semi-) working script using rle is:

DF <- data.frame(year=rep(1991:2004, each=2),

 origDat=c(0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1))

#DF <- data.frame(year=rep(1991:2004, each=2),
#
origDat=c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1))

#DF <- data.frame(year=rep(1991:2004, each=2),
#
origDat=c(1,1,1,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1))

# An example that doesn't work
DF4 <- data.frame(year=rep(1991:2004, each=2),

 origDat=c(1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1))
#DF <- DF4

DF$inc <- c(1, abs(diff(DF$origDat)))
DF$cumsum <- cumsum(DF$inc)

ex1 <- aggregate(year ~ cumsum, data=DF, function(x) length(unique(x)))
names(ex1) <- c('cumsum','isl')

tmp1a <- merge(DF, ex1, by="cumsum", all.x=TRUE)
tmp1a$isl2 <- (-1*tmp1a$origDat) * tmp1a$isl
tmp1a$isl2[tmp1a$isl2==0] <- tmp1a$isl[tmp1a$isl2==0]
tmp1a$isl2 <- -1 * tmp1a$isl2

DF$grpng <- tmp1a$isl2

runlen <- data.frame(cumsum = seq(1:length(rle(DF$grpng)$lengths)),
 len = rle(DF$grpng)$lengths,
 val = rle(DF$grpng)$values)

i <- 1
while(i <= nrow(runlen)){
  if(runlen[i,'val'] >= 2){  # As long as a '-2' or smaller doesn't follow,
# then the current group of data is NOT
# too 'distant' from other data and should be
# preserved.  Otherwise, the current grp of
# 1's should be reset to 0
j <- i + 1
while(j <= nrow(runlen)){
  if(runlen[j,'val'] <= -2){
# If code enters here, then swich the sign of 'val' to
# effectively inactivate this block of 1's
runlen[i,'val'] <- -1 * runlen[i,'val']
  }
  #print(paste0("j: ",as.character(j)))
  j <- j + 1
}
  } else if (runlen[i,'val'] > 0 & runlen[i,'val'] < 2){
# If the script enters here, then the current group of data
# doesn't meet the minimum continuous length requirement of
# 2 or more years (in this example a check of >0 & <2 seems
# silly, but in the real-world dataset 2 will be replaced with
# a much larger example.
runlen[i,'val'] <- -1 * runlen[i,'val']
  }
  #print(paste0("i: ",as.character(i)))
  i <- i + 1
}

runlen$finalDat <- ifelse(runlen$val < 0, 0, 1)
DF <- merge(DF, runlen, by = 'cumsum', all.x = TRUE)
DF

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshaping a large dataframe in R

2016-11-28 Thread Daniel Nordlund

On 11/28/2016 1:06 AM, jean-philippe wrote:

dear all,

I have a dataframe of 500 rows and 4004 columns that I would like to
reshape to a dataframe of 500500 rows and 4 columns. That is from this
dataframe:

V1 V2 V3 V4 ... V4001 V4002 V4003 V4004

1 2 3 4 ... 4001 4002 4003 4004

1 2 3 4 ... 4001 4002 4003 4004

1 2 3 4 ... 4001 4002 4003 4004

... ... ... ... ... ... ... ... ... ... ... ... ...

1 2 3 4 ... 4001 4002 4003 4004

I would like :


V1 V2 V3 V4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

... ... ... ... ... ... ... ... ...

4001 4002 4003 4004

4001 4002 4003 4004

4001 4002 4003 4004

... ... ... ... ...

4001 4002 4003 4004

I tried already to use y=matrix(as.matrix(dataGaus[[1]]),500500,4)
(where dataGaus is my dataframe) but it doesn't give the expected
result. I tried also to use reshape but I can't manage to use it to
reproduce the result (and I have been through lot of posts on
StackOverflow and on the net). In python, we can do this with a simple
command numpy.array(dataGaus[[1]]).reshape(-1,4). For some reasons, I am
doing my analysis in R, and I would like to know if there is a function
which does the same thing as the reshape(-1,4) of numpy in Python?

Thanks in advance, best


Jean-Philippe



I don't know about efficiency, but it looks like you could do something 
like this:


y <- t(matrix(t(dataGaus),4))


Maybe someone will come along with something better,

Dan

--
Daniel Nordlund
Port Townsend, WA  USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Manipulating groups of boolean data subject to group size and distance from other groups

2016-11-28 Thread David Winsemius

> On Nov 28, 2016, at 9:38 AM, Morway, Eric  wrote:
> 
> The example below is a pared-down version of a much larger dataset.  My
> goal is to use the binary data contained in DF$col2 to guide manipulation
> of the binary data itself, subject to the following:
> 
>   - Groups of '1' that are separated from other, larger groups of "1's" in
>   'col2' by 2 or more years should be converted to "0"
>   - Groups of '1' need to be at least 2 consecutive years to be preserved
> 
> So in the example provided below, DF$col2 would be manipulated such that
> its values are overrided to:
> 
> c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1)
> 
> That is, the first group of 1's in positions 2 through 6 are separated from
> other groups of 1's by 2 (or more) years, and the second group of 1's
> (positions 11 & 12) span only a single year and do not meet the criteria of
> being at least 2 years long.
> 
> The example R script below shows a small example I'm working with, called
> "DF".  The code that comes after the first line is my attempt to go through
> some R-gymnastics to append a column to DF called "isl2" that reflects the
> number of consecutive years in the 0/1 groups, where the +/- sign acts as
> (or denotes) the original binary condition: 0 = negative, 1 = positive.
> However, I'm stuck with how to proceed further.  Could someone please help
> me come up with script that modifies DF$col2 shown below to be like that
> shown above?
> 
> DF <- data.frame(col1=rep(1991:2004,
> each=2),col2=c(0,0,1,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1))

It's not clear from you verbal description why the first group pf 1's with 
length 4 is discarded while the second group of ones also of length 4 is 
preserved. There's ambiguity in the rules about "how large" a run must be in 
order to be "safe" from removal.

In any case the answer will almost surely involve the use of the rle function 
which if you have not encountered it should be your next visit to the help 
pages.

-- 
David,
> 
> DF$inc <- c(0, abs(diff(DF$col2)))
> DF$cum <- cumsum(DF$inc)
> 
> ex1 <- aggregate(col1 ~ cum, data=DF, function(x) length(unique(x)))
> names(ex1) <- c('cum','isl')
> 
> tmp1a <- merge(DF, ex1, by="cum", all.x=TRUE)
> tmp1a$isl2 <- (-1*tmp1a$col2) * tmp1a$isl
> tmp1a$isl2[tmp1a$isl2==0] <- tmp1a$isl[tmp1a$isl2==0]
> 
> DF$grpng <- tmp1a$isl2
> 
> At this point I was thinking I could use DF$grpng to sweep through col2 and
> make adjustments, but I didn't know how to proceed.
> 
> For debugging purposes, a slightly different example would go from:
> 
> DF <- data.frame(col1=rep(1991:2004, each=2),col2=c(1,1,1,1,
> 1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1))
> 
> to 'col2' looking like:
> 
> c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1)
> 
> That is, even though the first group of 1's is greater than two consecutive
> years, it is separated from a larger group of 1's by 2 (or more years).


> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshaping a large dataframe in R

2016-11-28 Thread David L Carlson
There may be a simpler way of getting there, but this works:

> rows <- 500
> cols <- 4004
> dat <- as.data.frame(t(replicate(rows, 1:cols)))
> dat[c(1:3, 500), c(1:4, 4001:4004)]
V1 V2 V3 V4 V4001 V4002 V4003 V4004
11  2  3  4  4001  4002  4003  4004
21  2  3  4  4001  4002  4003  4004
31  2  3  4  4001  4002  4003  4004
500  1  2  3  4  4001  4002  4003  4004
> dat2 <- array(as.matrix(dat), dim=c(rows, 4, cols/4))
> dat3 <- as.data.frame(matrix(aperm(dat2, c(1, 3, 2)), rows*cols/4, 4))
> head(dat3)
  V1 V2 V3 V4
1  1  2  3  4
2  1  2  3  4
3  1  2  3  4
4  1  2  3  4
5  1  2  3  4
6  1  2  3  4
> tail(dat3)
 V1   V2   V3   V4
500495 4001 4002 4003 4004
500496 4001 4002 4003 4004
500497 4001 4002 4003 4004
500498 4001 4002 4003 4004
500499 4001 4002 4003 4004
500500 4001 4002 4003 4004

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of jean-philippe
Sent: Monday, November 28, 2016 3:07 AM
To: r-help@r-project.org
Subject: [R] reshaping a large dataframe in R

dear all,

I have a dataframe of 500 rows and 4004 columns that I would like to 
reshape to a dataframe of 500500 rows and 4 columns. That is from this 
dataframe:

V1 V2 V3 V4 ... V4001 V4002 V4003 V4004

1 2 3 4 ... 4001 4002 4003 4004

1 2 3 4 ... 4001 4002 4003 4004

1 2 3 4 ... 4001 4002 4003 4004

... ... ... ... ... ... ... ... ... ... ... ... ...

1 2 3 4 ... 4001 4002 4003 4004

I would like :


V1 V2 V3 V4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

... ... ... ... ... ... ... ... ...

4001 4002 4003 4004

4001 4002 4003 4004

4001 4002 4003 4004

... ... ... ... ...

4001 4002 4003 4004

I tried already to use y=matrix(as.matrix(dataGaus[[1]]),500500,4) 
(where dataGaus is my dataframe) but it doesn't give the expected 
result. I tried also to use reshape but I can't manage to use it to 
reproduce the result (and I have been through lot of posts on 
StackOverflow and on the net). In python, we can do this with a simple 
command numpy.array(dataGaus[[1]]).reshape(-1,4). For some reasons, I am 
doing my analysis in R, and I would like to know if there is a function 
which does the same thing as the reshape(-1,4) of numpy in Python?

Thanks in advance, best


Jean-Philippe

-- 
Jean-Philippe Fontaine
PhD Student in Astroparticle Physics,
Gran Sasso Science Institute (GSSI),
Viale Francesco Crispi 7,
67100 L'Aquila, Italy
Mobile: +393487128593, +33615653774

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reshaping a large dataframe in R

2016-11-28 Thread jean-philippe

dear all,

I have a dataframe of 500 rows and 4004 columns that I would like to 
reshape to a dataframe of 500500 rows and 4 columns. That is from this 
dataframe:


V1 V2 V3 V4 ... V4001 V4002 V4003 V4004

1 2 3 4 ... 4001 4002 4003 4004

1 2 3 4 ... 4001 4002 4003 4004

1 2 3 4 ... 4001 4002 4003 4004

... ... ... ... ... ... ... ... ... ... ... ... ...

1 2 3 4 ... 4001 4002 4003 4004

I would like :


V1 V2 V3 V4

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

... ... ... ... ... ... ... ... ...

4001 4002 4003 4004

4001 4002 4003 4004

4001 4002 4003 4004

... ... ... ... ...

4001 4002 4003 4004

I tried already to use y=matrix(as.matrix(dataGaus[[1]]),500500,4) 
(where dataGaus is my dataframe) but it doesn't give the expected 
result. I tried also to use reshape but I can't manage to use it to 
reproduce the result (and I have been through lot of posts on 
StackOverflow and on the net). In python, we can do this with a simple 
command numpy.array(dataGaus[[1]]).reshape(-1,4). For some reasons, I am 
doing my analysis in R, and I would like to know if there is a function 
which does the same thing as the reshape(-1,4) of numpy in Python?


Thanks in advance, best


Jean-Philippe

--
Jean-Philippe Fontaine
PhD Student in Astroparticle Physics,
Gran Sasso Science Institute (GSSI),
Viale Francesco Crispi 7,
67100 L'Aquila, Italy
Mobile: +393487128593, +33615653774

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Manipulating groups of boolean data subject to group size and distance from other groups

2016-11-28 Thread Morway, Eric
The example below is a pared-down version of a much larger dataset.  My
goal is to use the binary data contained in DF$col2 to guide manipulation
of the binary data itself, subject to the following:

   - Groups of '1' that are separated from other, larger groups of "1's" in
   'col2' by 2 or more years should be converted to "0"
   - Groups of '1' need to be at least 2 consecutive years to be preserved

So in the example provided below, DF$col2 would be manipulated such that
its values are overrided to:

c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1)

That is, the first group of 1's in positions 2 through 6 are separated from
other groups of 1's by 2 (or more) years, and the second group of 1's
(positions 11 & 12) span only a single year and do not meet the criteria of
being at least 2 years long.

The example R script below shows a small example I'm working with, called
"DF".  The code that comes after the first line is my attempt to go through
some R-gymnastics to append a column to DF called "isl2" that reflects the
number of consecutive years in the 0/1 groups, where the +/- sign acts as
(or denotes) the original binary condition: 0 = negative, 1 = positive.
However, I'm stuck with how to proceed further.  Could someone please help
me come up with script that modifies DF$col2 shown below to be like that
shown above?

DF <- data.frame(col1=rep(1991:2004,
each=2),col2=c(0,0,1,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,1,1,1,1,1,1))

DF$inc <- c(0, abs(diff(DF$col2)))
DF$cum <- cumsum(DF$inc)

ex1 <- aggregate(col1 ~ cum, data=DF, function(x) length(unique(x)))
names(ex1) <- c('cum','isl')

tmp1a <- merge(DF, ex1, by="cum", all.x=TRUE)
tmp1a$isl2 <- (-1*tmp1a$col2) * tmp1a$isl
tmp1a$isl2[tmp1a$isl2==0] <- tmp1a$isl[tmp1a$isl2==0]

DF$grpng <- tmp1a$isl2

At this point I was thinking I could use DF$grpng to sweep through col2 and
make adjustments, but I didn't know how to proceed.

For debugging purposes, a slightly different example would go from:

DF <- data.frame(col1=rep(1991:2004, each=2),col2=c(1,1,1,1,
1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1))

to 'col2' looking like:

c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1)

That is, even though the first group of 1's is greater than two consecutive
years, it is separated from a larger group of 1's by 2 (or more years).

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to Install POT Package for R 3.1.0

2016-11-28 Thread Jeff Newmiller
You probably need to upgrade your R software to the current version. Many CRAN 
mirrors don't keep binary package repositories for old versions of R online, 
and the Posting Guide warns that old versions of R are effectively off-topic on 
the mailing lists.

You could also try downloading the zip file and installing it, but you are on 
your own then since there are many possible hard-to-diagnose problems with 
doing that. 
-- 
Sent from my phone. Please excuse my brevity.

On November 23, 2016 4:09:48 AM PST, Preetam Pal  wrote:
>Hi, I am trying to install the package POT for R* version 3.1.0*
>(spring
>dance), using:
>
>*install.packages("POT", repos="http://R-Forge.R-project.org
>")*
>*( link  )*
>
>*But I am getting the following error:*
>
>
>*package ‘POT’ is available as a source package but not as a
>binaryWarning
>in install.packages :  package ‘POT’ is not available (for R version
>3.1.0)*
>
>
>Can anyone suggest how I can get it working please?
>I need it for Peaks-Over-Threshold analysis under extreme value theory.
>I
>am trying to make use of functions mentioned in this link ( link2
> ).Thanks.
>
>Regards,
>Preetam
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.