Re: [R] Help batch saving elements of a list into unique files

2016-03-22 Thread Jim Lemon
Okay. Got some lunch, I can think about this with both halves of the brain.

drop_token1<-function(x) {
 return(paste(x[2:length(x)],sep="",collapse="."))
}
affnames<-c("X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL",
 "X1.Classical.10.11.1_.HuEx.1_0.st.v2..CEL")
affnames.split<-strsplit(affnames,"[.]")
lapply(affnames.split,drop_token1)
[[1]]
[1] "Classical.10.11.1_.HuEx.1_0.st.v2..CEL"

[[2]]
[1] "Classical.10.11.1_.HuEx.1_0.st.v2..CEL"

This what I get with a toy example. So, I think that:

for(affdf in 1:length(out)) {
names(out[[affdf]])<-lapply(strsplit(names(out[[affdf]]),"[.]),drop_token1)
write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
}

should work.

Jim


On Wed, Mar 23, 2016 at 11:16 AM, Christian T Stackhouse (Campus)
 wrote:
> I re ran it and this is what I got: 11.1_.HuEx.1_0.st.v2..CEL
> Should be: 10.11.1_.HuEx.1_0.st.v2..CEL
>
> Christian T. Stackhouse | Graduate Student
> GBS Neuroscience Theme
> Department of Neurosurgery
> Department of Radiation Oncology
> UAB | The University of Alabama at Birmingham
> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
> 35233
> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
> ctsta...@gmail.com
>
> uab.edu
> Knowledge that will change your world
>
>
> 
> From: Jim Lemon 
> Sent: Tuesday, March 22, 2016 6:24 PM
> To: Christian T Stackhouse (Campus)
> Cc: r-help@r-project.org
> Subject: Re: [R] Help batch saving elements of a list into unique files
>
> Transcription. I forgot the "collapse" argument when I wrote the email:
>
> drop_token1<-function(x) {
>  return(paste(x[2:length(x)],sep="",collapse="."))
> }
>
> Jim
>
>
> On Wed, Mar 23, 2016 at 10:14 AM, Christian T Stackhouse (Campus)
>  wrote:
>> Very close! The header now looks like this: c("10", "11", "1_", "HuEx", 
>> "1_0", "st", "v2", "", "CEL")
>>  For some reason, it's not concatenating.
>>
>> Best,
>>
>> Christian T. Stackhouse | Graduate Student
>> GBS Neuroscience Theme
>> Department of Neurosurgery
>> Department of Radiation Oncology
>> UAB | The University of Alabama at Birmingham
>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>> 35233
>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>> ctsta...@gmail.com
>>
>> uab.edu
>> Knowledge that will change your world
>>
>>
>> 
>> From: Jim Lemon 
>> Sent: Tuesday, March 22, 2016 6:02 PM
>> To: Christian T Stackhouse (Campus)
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>
>> I think it's the "unlist". I can only test this with one set of made
>> up names at a time.
>>
>> names(out[[affdf]])<-
>>  lapply(strsplit(names(out[[affdf]]),"[.]"),drop_token1)
>>
>> Jim
>>
>>
>> On Wed, Mar 23, 2016 at 9:57 AM, Christian T Stackhouse (Campus)
>>  wrote:
>>> This is what I ran:
>>>
 drop_token1<-function(x) {
>>> +   return(paste(x[2:length(x)],sep="."))
>>> + }
 for(affdf in 1:length(out)) {
>>> +   
>>> names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)
>>> +   write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
>>> + }
>>> Error in names(out[[affdf]]) <- lapply(unlist(strsplit(names(out[[affdf]]), 
>>>  :
>>>   'names' attribute [1148] must be the same length as the vector [118]

>>>
>>> This is what the header was before:
>>>
>>> X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>>>
>>> There was no output due to the error.
>>>
>>> Christian T. Stackhouse | Graduate Student
>>> GBS Neuroscience Theme
>>> Department of Neurosurgery
>>> Department of Radiation Oncology
>>> UAB | The University of Alabama at Birmingham
>>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>>> 35233
>>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>>> ctsta...@gmail.com
>>>
>>> uab.edu
>>> Knowledge that will change your world
>>>
>>>
>>> 
>>> From: Jim Lemon 
>>> Sent: Tuesday, March 22, 2016 5:46 PM
>>> To: Christian T Stackhouse (Campus)
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>>
>>> Sorry, should be:
>>>
>>> names(out[[affdf]])<-
>>>  lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)
>>>
>>> Jim
>>>
>>>
>>> On Wed, Mar 23, 2016 at 9:43 AM, Christian T Stackhouse (Campus)
>>>  wrote:
 Thank you, Jim. I got this error returned:

 Error in strsplit(names(out[[affdf]])) :
   argument "split" is missing, with no default

 Christian T. Stackhouse | Graduate Student
 GBS Neuroscience Theme
 Department of Neurosurgery
 Department of Radiation Oncology
 UAB | The University of Alabama at Birmingham
 Hazelrig-Salter Radiation Oncology 

Re: [R] Help batch saving elements of a list into unique files

2016-03-22 Thread Jim Lemon
Transcription. I forgot the "collapse" argument when I wrote the email:

drop_token1<-function(x) {
 return(paste(x[2:length(x)],sep="",collapse="."))
}

Jim


On Wed, Mar 23, 2016 at 10:14 AM, Christian T Stackhouse (Campus)
 wrote:
> Very close! The header now looks like this: c("10", "11", "1_", "HuEx", 
> "1_0", "st", "v2", "", "CEL")
>  For some reason, it's not concatenating.
>
> Best,
>
> Christian T. Stackhouse | Graduate Student
> GBS Neuroscience Theme
> Department of Neurosurgery
> Department of Radiation Oncology
> UAB | The University of Alabama at Birmingham
> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
> 35233
> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
> ctsta...@gmail.com
>
> uab.edu
> Knowledge that will change your world
>
>
> 
> From: Jim Lemon 
> Sent: Tuesday, March 22, 2016 6:02 PM
> To: Christian T Stackhouse (Campus)
> Cc: r-help@r-project.org
> Subject: Re: [R] Help batch saving elements of a list into unique files
>
> I think it's the "unlist". I can only test this with one set of made
> up names at a time.
>
> names(out[[affdf]])<-
>  lapply(strsplit(names(out[[affdf]]),"[.]"),drop_token1)
>
> Jim
>
>
> On Wed, Mar 23, 2016 at 9:57 AM, Christian T Stackhouse (Campus)
>  wrote:
>> This is what I ran:
>>
>>> drop_token1<-function(x) {
>> +   return(paste(x[2:length(x)],sep="."))
>> + }
>>> for(affdf in 1:length(out)) {
>> +   
>> names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)
>> +   write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
>> + }
>> Error in names(out[[affdf]]) <- lapply(unlist(strsplit(names(out[[affdf]]),  
>> :
>>   'names' attribute [1148] must be the same length as the vector [118]
>>>
>>
>> This is what the header was before:
>>
>> X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>>
>> There was no output due to the error.
>>
>> Christian T. Stackhouse | Graduate Student
>> GBS Neuroscience Theme
>> Department of Neurosurgery
>> Department of Radiation Oncology
>> UAB | The University of Alabama at Birmingham
>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>> 35233
>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>> ctsta...@gmail.com
>>
>> uab.edu
>> Knowledge that will change your world
>>
>>
>> 
>> From: Jim Lemon 
>> Sent: Tuesday, March 22, 2016 5:46 PM
>> To: Christian T Stackhouse (Campus)
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>
>> Sorry, should be:
>>
>> names(out[[affdf]])<-
>>  lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)
>>
>> Jim
>>
>>
>> On Wed, Mar 23, 2016 at 9:43 AM, Christian T Stackhouse (Campus)
>>  wrote:
>>> Thank you, Jim. I got this error returned:
>>>
>>> Error in strsplit(names(out[[affdf]])) :
>>>   argument "split" is missing, with no default
>>>
>>> Christian T. Stackhouse | Graduate Student
>>> GBS Neuroscience Theme
>>> Department of Neurosurgery
>>> Department of Radiation Oncology
>>> UAB | The University of Alabama at Birmingham
>>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>>> 35233
>>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>>> ctsta...@gmail.com
>>>
>>> uab.edu
>>> Knowledge that will change your world
>>>
>>>
>>> 
>>> From: Jim Lemon 
>>> Sent: Tuesday, March 22, 2016 5:39 PM
>>> To: Christian T Stackhouse (Campus)
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>>
>>> Okay, I just snipped off the first token in the header labels assuming
>>> that there would be no more periods. Try this:
>>>
>>> drop_token1<-function(x) {
>>>  return(paste(x[2:length(x)],sep="."))
>>> }
>>> for(affdf in 1:length(out)) {
>>> names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]))),drop_token1)
>>> write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
>>> }
>>>
>>> Jim
>>>
>>> On Wed, Mar 23, 2016 at 9:13 AM, Christian T Stackhouse (Campus)
>>>  wrote:
 Jim,

 It worked! It wrote out the files, but unfortunately, it didn't work for 
 the file headers. I should have mentioned this is what the headers look 
 like: X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
 After running your script, that header changes to: 10. I'd just like to 
 remove the "X0." prefix or in the case of file 189 the "X188." prefix 
 leaving: Classical.10.11.1_.HuEx.1_0.st.v2..CEL

 Thank you so much for your help! I'll try playing with it myself, but if 
 you have any further insights they would be greatly appreciated!

 Best,

 Christian T. Stackhouse | Graduate Student
 GBS Neuroscience Theme

Re: [R] Help batch saving elements of a list into unique files

2016-03-22 Thread Jim Lemon
I think it's the "unlist". I can only test this with one set of made
up names at a time.

names(out[[affdf]])<-
 lapply(strsplit(names(out[[affdf]]),"[.]"),drop_token1)

Jim


On Wed, Mar 23, 2016 at 9:57 AM, Christian T Stackhouse (Campus)
 wrote:
> This is what I ran:
>
>> drop_token1<-function(x) {
> +   return(paste(x[2:length(x)],sep="."))
> + }
>> for(affdf in 1:length(out)) {
> +   
> names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)
> +   write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
> + }
> Error in names(out[[affdf]]) <- lapply(unlist(strsplit(names(out[[affdf]]),  :
>   'names' attribute [1148] must be the same length as the vector [118]
>>
>
> This is what the header was before:
>
> X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>
> There was no output due to the error.
>
> Christian T. Stackhouse | Graduate Student
> GBS Neuroscience Theme
> Department of Neurosurgery
> Department of Radiation Oncology
> UAB | The University of Alabama at Birmingham
> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
> 35233
> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
> ctsta...@gmail.com
>
> uab.edu
> Knowledge that will change your world
>
>
> 
> From: Jim Lemon 
> Sent: Tuesday, March 22, 2016 5:46 PM
> To: Christian T Stackhouse (Campus)
> Cc: r-help@r-project.org
> Subject: Re: [R] Help batch saving elements of a list into unique files
>
> Sorry, should be:
>
> names(out[[affdf]])<-
>  lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)
>
> Jim
>
>
> On Wed, Mar 23, 2016 at 9:43 AM, Christian T Stackhouse (Campus)
>  wrote:
>> Thank you, Jim. I got this error returned:
>>
>> Error in strsplit(names(out[[affdf]])) :
>>   argument "split" is missing, with no default
>>
>> Christian T. Stackhouse | Graduate Student
>> GBS Neuroscience Theme
>> Department of Neurosurgery
>> Department of Radiation Oncology
>> UAB | The University of Alabama at Birmingham
>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>> 35233
>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>> ctsta...@gmail.com
>>
>> uab.edu
>> Knowledge that will change your world
>>
>>
>> 
>> From: Jim Lemon 
>> Sent: Tuesday, March 22, 2016 5:39 PM
>> To: Christian T Stackhouse (Campus)
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>
>> Okay, I just snipped off the first token in the header labels assuming
>> that there would be no more periods. Try this:
>>
>> drop_token1<-function(x) {
>>  return(paste(x[2:length(x)],sep="."))
>> }
>> for(affdf in 1:length(out)) {
>> names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]))),drop_token1)
>> write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
>> }
>>
>> Jim
>>
>> On Wed, Mar 23, 2016 at 9:13 AM, Christian T Stackhouse (Campus)
>>  wrote:
>>> Jim,
>>>
>>> It worked! It wrote out the files, but unfortunately, it didn't work for 
>>> the file headers. I should have mentioned this is what the headers look 
>>> like: X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>>> After running your script, that header changes to: 10. I'd just like to 
>>> remove the "X0." prefix or in the case of file 189 the "X188." prefix 
>>> leaving: Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>>>
>>> Thank you so much for your help! I'll try playing with it myself, but if 
>>> you have any further insights they would be greatly appreciated!
>>>
>>> Best,
>>>
>>> Christian T. Stackhouse | Graduate Student
>>> GBS Neuroscience Theme
>>> Department of Neurosurgery
>>> Department of Radiation Oncology
>>> UAB | The University of Alabama at Birmingham
>>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>>> 35233
>>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>>> ctsta...@gmail.com
>>>
>>> uab.edu
>>> Knowledge that will change your world
>>>
>>>
>>> 
>>> From: Jim Lemon 
>>> Sent: Tuesday, March 22, 2016 4:48 PM
>>> To: Christian T Stackhouse (Campus)
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>>
>>> Hi Christian,
>>> This untested script might get you going (assuming you want a CSV format):
>>>
>>> for(affdf in 1:length(out)) {
>>>  names(out[[affdf]])<-lapply(strsplit(names(out[[affdf]]),"[.]"),"[",2)
>>>  write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
>>> }
>>>
>>> Jim
>>>
>>>
>>> On Wed, Mar 23, 2016 at 6:32 AM, Christian T Stackhouse (Campus)
>>>  wrote:
 Hello!


 The overall goal I have is taking a large data frame and splitting it into 
 several smaller data frames (preserving column headers) which 

Re: [R] Help batch saving elements of a list into unique files

2016-03-22 Thread Jim Lemon
Sorry, should be:

names(out[[affdf]])<-
 lapply(unlist(strsplit(names(out[[affdf]]),"[.]")),drop_token1)

Jim


On Wed, Mar 23, 2016 at 9:43 AM, Christian T Stackhouse (Campus)
 wrote:
> Thank you, Jim. I got this error returned:
>
> Error in strsplit(names(out[[affdf]])) :
>   argument "split" is missing, with no default
>
> Christian T. Stackhouse | Graduate Student
> GBS Neuroscience Theme
> Department of Neurosurgery
> Department of Radiation Oncology
> UAB | The University of Alabama at Birmingham
> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
> 35233
> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
> ctsta...@gmail.com
>
> uab.edu
> Knowledge that will change your world
>
>
> 
> From: Jim Lemon 
> Sent: Tuesday, March 22, 2016 5:39 PM
> To: Christian T Stackhouse (Campus)
> Cc: r-help@r-project.org
> Subject: Re: [R] Help batch saving elements of a list into unique files
>
> Okay, I just snipped off the first token in the header labels assuming
> that there would be no more periods. Try this:
>
> drop_token1<-function(x) {
>  return(paste(x[2:length(x)],sep="."))
> }
> for(affdf in 1:length(out)) {
> names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]))),drop_token1)
> write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
> }
>
> Jim
>
> On Wed, Mar 23, 2016 at 9:13 AM, Christian T Stackhouse (Campus)
>  wrote:
>> Jim,
>>
>> It worked! It wrote out the files, but unfortunately, it didn't work for the 
>> file headers. I should have mentioned this is what the headers look like: 
>> X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>> After running your script, that header changes to: 10. I'd just like to 
>> remove the "X0." prefix or in the case of file 189 the "X188." prefix 
>> leaving: Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>>
>> Thank you so much for your help! I'll try playing with it myself, but if you 
>> have any further insights they would be greatly appreciated!
>>
>> Best,
>>
>> Christian T. Stackhouse | Graduate Student
>> GBS Neuroscience Theme
>> Department of Neurosurgery
>> Department of Radiation Oncology
>> UAB | The University of Alabama at Birmingham
>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>> 35233
>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>> ctsta...@gmail.com
>>
>> uab.edu
>> Knowledge that will change your world
>>
>>
>> 
>> From: Jim Lemon 
>> Sent: Tuesday, March 22, 2016 4:48 PM
>> To: Christian T Stackhouse (Campus)
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Help batch saving elements of a list into unique files
>>
>> Hi Christian,
>> This untested script might get you going (assuming you want a CSV format):
>>
>> for(affdf in 1:length(out)) {
>>  names(out[[affdf]])<-lapply(strsplit(names(out[[affdf]]),"[.]"),"[",2)
>>  write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
>> }
>>
>> Jim
>>
>>
>> On Wed, Mar 23, 2016 at 6:32 AM, Christian T Stackhouse (Campus)
>>  wrote:
>>> Hello!
>>>
>>>
>>> The overall goal I have is taking a large data frame and splitting it into 
>>> several smaller data frames (preserving column headers) which I can save as 
>>> txt files to feed into my APACHE ANY23 server for conversion into RDF.
>>>
>>>
>>> This is what I call to split up the original file:
>>>
>>>
>>> out <- split(affymetrix, (seq(nrow(affymetrix))-1) %/% 140)
>>>
>>>
>>> I have a list (out) of length 187 for which each element is a dataframe. I 
>>> want to iteratively save each data frame as a separate tab file with a 
>>> naming structure such as: affymetrix1.txt, affymetrix2.txt, ... 
>>> affymetrix187.txt
>>>
>>>
>>> Before that, I need to modify the headers to remove a prefix "X0. , X1., 
>>> ... X187." that was introduced during my original splitting. I need to 
>>> remove all characters before and including the first "."
>>>
>>>
>>> If anyone has a better way of doing this, please let me know. Otherwise, 
>>> help with how to perform batch editing of the headers and batch saving of 
>>> the files would be greatly appreciated!
>>>
>>>
>>> Best,
>>>
>>> Christian T. Stackhouse | Graduate Student
>>> GBS Neuroscience Theme
>>> Department of Neurosurgery
>>> Department of Radiation Oncology
>>> UAB | The University of Alabama at Birmingham
>>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>>> 35233
>>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>>> ctsta...@gmail.com
>>>
>>> uab.edu
>>> Knowledge that will change your world
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 

Re: [R] Help batch saving elements of a list into unique files

2016-03-22 Thread Jim Lemon
Okay, I just snipped off the first token in the header labels assuming
that there would be no more periods. Try this:

drop_token1<-function(x) {
 return(paste(x[2:length(x)],sep="."))
}
for(affdf in 1:length(out)) {
names(out[[affdf]])<-lapply(unlist(strsplit(names(out[[affdf]]))),drop_token1)
write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
}

Jim

On Wed, Mar 23, 2016 at 9:13 AM, Christian T Stackhouse (Campus)
 wrote:
> Jim,
>
> It worked! It wrote out the files, but unfortunately, it didn't work for the 
> file headers. I should have mentioned this is what the headers look like: 
> X0.Classical.10.11.1_.HuEx.1_0.st.v2..CEL
> After running your script, that header changes to: 10. I'd just like to 
> remove the "X0." prefix or in the case of file 189 the "X188." prefix 
> leaving: Classical.10.11.1_.HuEx.1_0.st.v2..CEL
>
> Thank you so much for your help! I'll try playing with it myself, but if you 
> have any further insights they would be greatly appreciated!
>
> Best,
>
> Christian T. Stackhouse | Graduate Student
> GBS Neuroscience Theme
> Department of Neurosurgery
> Department of Radiation Oncology
> UAB | The University of Alabama at Birmingham
> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
> 35233
> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
> ctsta...@gmail.com
>
> uab.edu
> Knowledge that will change your world
>
>
> 
> From: Jim Lemon 
> Sent: Tuesday, March 22, 2016 4:48 PM
> To: Christian T Stackhouse (Campus)
> Cc: r-help@r-project.org
> Subject: Re: [R] Help batch saving elements of a list into unique files
>
> Hi Christian,
> This untested script might get you going (assuming you want a CSV format):
>
> for(affdf in 1:length(out)) {
>  names(out[[affdf]])<-lapply(strsplit(names(out[[affdf]]),"[.]"),"[",2)
>  write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
> }
>
> Jim
>
>
> On Wed, Mar 23, 2016 at 6:32 AM, Christian T Stackhouse (Campus)
>  wrote:
>> Hello!
>>
>>
>> The overall goal I have is taking a large data frame and splitting it into 
>> several smaller data frames (preserving column headers) which I can save as 
>> txt files to feed into my APACHE ANY23 server for conversion into RDF.
>>
>>
>> This is what I call to split up the original file:
>>
>>
>> out <- split(affymetrix, (seq(nrow(affymetrix))-1) %/% 140)
>>
>>
>> I have a list (out) of length 187 for which each element is a dataframe. I 
>> want to iteratively save each data frame as a separate tab file with a 
>> naming structure such as: affymetrix1.txt, affymetrix2.txt, ... 
>> affymetrix187.txt
>>
>>
>> Before that, I need to modify the headers to remove a prefix "X0. , X1., ... 
>> X187." that was introduced during my original splitting. I need to remove 
>> all characters before and including the first "."
>>
>>
>> If anyone has a better way of doing this, please let me know. Otherwise, 
>> help with how to perform batch editing of the headers and batch saving of 
>> the files would be greatly appreciated!
>>
>>
>> Best,
>>
>> Christian T. Stackhouse | Graduate Student
>> GBS Neuroscience Theme
>> Department of Neurosurgery
>> Department of Radiation Oncology
>> UAB | The University of Alabama at Birmingham
>> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
>> 35233
>> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
>> ctsta...@gmail.com
>>
>> uab.edu
>> Knowledge that will change your world
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help batch saving elements of a list into unique files

2016-03-22 Thread Jim Lemon
Hi Christian,
This untested script might get you going (assuming you want a CSV format):

for(affdf in 1:length(out)) {
 names(out[[affdf]])<-lapply(strsplit(names(out[[affdf]]),"[.]"),"[",2)
 write.csv(out[[affdf]],file=paste("affymetrix",affdf,".txt",sep=""))
}

Jim


On Wed, Mar 23, 2016 at 6:32 AM, Christian T Stackhouse (Campus)
 wrote:
> Hello!
>
>
> The overall goal I have is taking a large data frame and splitting it into 
> several smaller data frames (preserving column headers) which I can save as 
> txt files to feed into my APACHE ANY23 server for conversion into RDF.
>
>
> This is what I call to split up the original file:
>
>
> out <- split(affymetrix, (seq(nrow(affymetrix))-1) %/% 140)
>
>
> I have a list (out) of length 187 for which each element is a dataframe. I 
> want to iteratively save each data frame as a separate tab file with a naming 
> structure such as: affymetrix1.txt, affymetrix2.txt, ... affymetrix187.txt
>
>
> Before that, I need to modify the headers to remove a prefix "X0. , X1., ... 
> X187." that was introduced during my original splitting. I need to remove all 
> characters before and including the first "."
>
>
> If anyone has a better way of doing this, please let me know. Otherwise, help 
> with how to perform batch editing of the headers and batch saving of the 
> files would be greatly appreciated!
>
>
> Best,
>
> Christian T. Stackhouse | Graduate Student
> GBS Neuroscience Theme
> Department of Neurosurgery
> Department of Radiation Oncology
> UAB | The University of Alabama at Birmingham
> Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
> 35233
> M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | 
> ctsta...@gmail.com
>
> uab.edu
> Knowledge that will change your world
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arranging axis within plotting area

2016-03-22 Thread Jim Lemon
Hi Eliza,
I think you only need to change the margins and the placement of the
right axis label:

colours <- c("black", "grey")
par(mar=c(5,4,4,4))
barplot(prop,ylab = "Numbers", cex.lab = 1.5, cex.main = 1.4,
 beside=TRUE, col=colours,ylim=c(0,250))
axis(side=3,xlim=c(0,45), at=c(6,12,18,24,30,36,42),
 labels=c("Gd","Mu","Bg","Gk","Mn","Rw","Kh"))
axis(side=3,xlim=c(0,45), at=c(6,12,18,24,30,36,42),
 labels=c("Gd","Mu","Bg","Gk","Mn","Rw","Kh"))
par(new=TRUE)
barplot(ELE, pch=15,  xlab="", ylab="",
 axes=FALSE, type="b", col="red",yaxt = "n",
 ylim = rev(c(0,4500)))
mtext("Cell Density",side=4,col="red",line=3)
axis(4, ylim=c(0,7000), col="red",col.axis="red",
 las=1,cex.lab=0.5,cex.axis=0.7)

Jim


On Wed, Mar 23, 2016 at 4:17 AM, Eliza Botto  wrote:
> Dear useRs,
> I have defined two matrices "prop" and "ELE" in the following manner
>> dput(prop)
> structure(c(122.4667, 87.1500875, 94.3647755102041, 84.8471625, 
> 95.2767755102041, 84.15558125, 121.8467, 90.75970625, 
> 98.2028979591837, 87.1500875, 88.2953043478261, 72.81219375, 88.234, 
> 85.73326875, 82.4549743589744, 82.6041125, 96.11239, 62.77575625, 
> 86.9222790697674, 64.74370625, 57.2601860465116, 58.98126875, 
> 91.78836, 84.8471625, 92.170347826087, 84.15558125, 91.398085106383, 
> 91.79875, 108.423025641026, 72.81219375), .Dim = c(2L, 15L), .Dimnames = 
> list(c("ori", "sat"), c("Ba", "Gd", "Ko", "Mu", "Mz", "Bg", "Do", "Gk", "Ka", 
> "Mn", "Na", "Rw", "Rb", "Kh", "Sk")))
>> dput(ELE)
> c(995.4, 813.5, 614, 2291, 702, 1038, 2571, 461, 700, 171, 2500, 1615, 587, 
> 1209, 1981)
> Then I gave the following commands to make a plot
> colours <- c("black", "grey")
> barplot(prop,ylab = "Numbers", cex.lab = 1.5, cex.main = 1.4, beside=TRUE, 
> col=colours,ylim=c(0,250))
> axis(side=3,xlim=c(0,45), at=c(6,12,18,24,30,36,42), 
> labels=c("Gd","Mu","Bg","Gk","Mn","Rw","Kh"))
> axis(side=3,xlim=c(0,45), at=c(6,12,18,24,30,36,42), 
> labels=c("Gd","Mu","Bg","Gk","Mn","Rw","Kh"))
> par(new=TRUE)
> barplot(ELE, pch=15,  xlab="", ylab="",
> axes=FALSE, type="b", col="red",yaxt = "n",ylim = rev(c(0,4500)))
> mtext("Cell Density",side=4,col="red",line=1)
> axis(4, ylim=c(0,7000), 
> col="red",col.axis="red",las=1,cex.lab=0.5,cex.axis=0.7)
> The problem is that the second Y-axis is somewhat located outside the plot 
> and not visible. I kindly, require your help on it.
> Thanks in advance,
> Eliza
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bug (?) with lapply / clusterMap / clusterApply etc

2016-03-22 Thread jacob


Hello I have encountered a bug(?) with the parallel package. When run  
from within a function, the parLapply function appears to be copying  
the entire parent environment (environment of interior of function)  
into all child nodes in the cluster, one node at a time - which is  
very very slow - and the copied contents are not even accessible  
within the child nodes even though they are apparent in the memory  
footprint. This happens when parLapply is run from within a function.  
I may be misusing the terms "parent" and "node" here...


The below code demonstrates the issue. The same parallel command is  
used twice within the function, once before creating a large object,  
and once afterwards. Both commands should take a nearly identical  
amount of time. Initially the parallel code takes less than 1/100th of  
a second, but in the second iteration requires hundreds of times  
longer...


Example Code:

 #create a cluster of nodes
 if(!"clus1" %in% ls()) clus1=makeCluster(10)

 #function used to demonstrate bug
 rows_fn1=function(x,clus){

 #first set of parallel code
  
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))


 #create large vector
 x=rnorm(10^7)

 #second set
  
print(system.time(parLapply(clus,1:5,function(z){y=rnorm(5000);return(mean(y))})))


 }

 #demonstrate bug - watch task manager and see windows slowly  
copy the vector to each node in the cluster

 rows_fn1(1:5000,clus1)

Although the child nodes bloat proportionally to the size of x in the  
parent environment, x is not available in the child nodes. The code  
above can be tweaked to add more variables (x1,x2,x3 ...) and the  
child nodes will bloat to the same degree.


I am working on Windows Server 2012, I am using 64bit R version 3.2.1.  
I upgraded to 3.2.4revised and observed the same bug.


I have googled for this issue and have not encountered any other  
individuals having a similar problem.


I have attempted to reboot my machine without effect (aside from the obvious).

Any suggestions would be greatly appreciated!

With regards,

Jacob L Strunk
Forest Biometrician (PhD), Statistician (MSc)
and Data Munger

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splm: fixed effects for time-invariant variables

2016-03-22 Thread Kwan Nok Chan
Dear users:

Has anyone tried using splm to estimate a fixed effects model with one or
more time-invariant variables?

I may have missed something in the manual, but there isn't a clear way to
switch to estimators (e.g. first difference) suited for data like that in
splm.

My colleagues and I are very interested to include splm in the analysis
because spatial models in other mainstream programs do not support temporal
variance.

Best regards,

K Chan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] arranging axis within plotting area

2016-03-22 Thread Eliza Botto
Dear useRs,
I have defined two matrices "prop" and "ELE" in the following manner
> dput(prop)
structure(c(122.4667, 87.1500875, 94.3647755102041, 84.8471625, 
95.2767755102041, 84.15558125, 121.8467, 90.75970625, 98.2028979591837, 
87.1500875, 88.2953043478261, 72.81219375, 88.234, 85.73326875, 
82.4549743589744, 82.6041125, 96.11239, 62.77575625, 86.9222790697674, 
64.74370625, 57.2601860465116, 58.98126875, 91.78836, 84.8471625, 
92.170347826087, 84.15558125, 91.398085106383, 91.79875, 108.423025641026, 
72.81219375), .Dim = c(2L, 15L), .Dimnames = list(c("ori", "sat"), c("Ba", 
"Gd", "Ko", "Mu", "Mz", "Bg", "Do", "Gk", "Ka", "Mn", "Na", "Rw", "Rb", "Kh", 
"Sk")))
> dput(ELE)
c(995.4, 813.5, 614, 2291, 702, 1038, 2571, 461, 700, 171, 2500, 1615, 587, 
1209, 1981)
Then I gave the following commands to make a plot
colours <- c("black", "grey")
barplot(prop,ylab = "Numbers", cex.lab = 1.5, cex.main = 1.4, beside=TRUE, 
col=colours,ylim=c(0,250))
axis(side=3,xlim=c(0,45), at=c(6,12,18,24,30,36,42), 
labels=c("Gd","Mu","Bg","Gk","Mn","Rw","Kh"))
axis(side=3,xlim=c(0,45), at=c(6,12,18,24,30,36,42), 
labels=c("Gd","Mu","Bg","Gk","Mn","Rw","Kh"))
par(new=TRUE)
barplot(ELE, pch=15,  xlab="", ylab="", 
axes=FALSE, type="b", col="red",yaxt = "n",ylim = rev(c(0,4500)))
mtext("Cell Density",side=4,col="red",line=1) 
axis(4, ylim=c(0,7000), col="red",col.axis="red",las=1,cex.lab=0.5,cex.axis=0.7)
The problem is that the second Y-axis is somewhat located outside the plot and 
not visible. I kindly, require your help on it.
Thanks in advance,
Eliza 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help batch saving elements of a list into unique files

2016-03-22 Thread Christian T Stackhouse (Campus)
Hello!


The overall goal I have is taking a large data frame and splitting it into 
several smaller data frames (preserving column headers) which I can save as txt 
files to feed into my APACHE ANY23 server for conversion into RDF.


This is what I call to split up the original file:


out <- split(affymetrix, (seq(nrow(affymetrix))-1) %/% 140)


I have a list (out) of length 187 for which each element is a dataframe. I want 
to iteratively save each data frame as a separate tab file with a naming 
structure such as: affymetrix1.txt, affymetrix2.txt, ... affymetrix187.txt


Before that, I need to modify the headers to remove a prefix "X0. , X1., ... 
X187." that was introduced during my original splitting. I need to remove all 
characters before and including the first "."


If anyone has a better way of doing this, please let me know. Otherwise, help 
with how to perform batch editing of the headers and batch saving of the files 
would be greatly appreciated!


Best,

Christian T. Stackhouse | Graduate Student
GBS Neuroscience Theme
Department of Neurosurgery
Department of Radiation Oncology
UAB | The University of Alabama at Birmingham
Hazelrig-Salter Radiation Oncology Center | 1700 6th Ave S | Birmingham, AL 
35233
M: 919.724.6890 | ctsta...@uab.edu | cstackho...@uabmc.edu | ctsta...@gmail.com

uab.edu
Knowledge that will change your world


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory usage in prcomp

2016-03-22 Thread Roy Mendelssohn - NOAA Federal

> On Mar 22, 2016, at 10:00 AM, Martin Maechler  
> wrote:
> 
>> Roy Mendelssohn <- NOAA Federal >
>>on Tue, 22 Mar 2016 07:42:10 -0700 writes:
> 
>> Hi All:
>> I am running prcomp on a very large array, roughly [50, 3650].  The 
>> array itself is 16GB.  I am running on a Unix machine and am running “top” 
>> at the same time and am quite surprised to see that the application memory 
>> usage is 76GB.  I have the “tol” set very high  (.8) so that it should only 
>> pull out a few components.  I am surprised at this memory usage because 
>> prcomp uses the SVD if I am not mistaken, and when I take guesses at the 
>> size of the SVD matrices they shouldn’t be that large.   While I can fit 
>> this  in, for a variety of reasons I would like to reduce the memory 
>> footprint.  She questions:
> 
>> 1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save memory 
>> if I scaled the data first myself, saved the result, cleared out the 
>> workspace, read the scaled data back in and did the prcomp call?  Basically 
>> are the intermediate calculations for scaling kept in memory after use.
> 
>> 2. I don’t know how prcomp memory usage compares to a direct call to “svn” 
>> which allows me to explicitly set how many  singular vectors to compute (I 
>> only need like the first five at most).  prcomp is convenient because it 
>> does a lot of the other work for me
> 
> For your example, where p := ncol(x)  is 3650  but you only want
> the first 5 PCs, it would be *considerably* more efficient to
> use svd(..., nv = 5) directly.
> 
> So I would take  stats:::prcomp.default  and modify it
> correspondingly.
> 
> This seems such a useful idea in general that I consider
> updating the function in R with a new optional 'rank.'  argument which
> you'd set to 5 in your case.
> 
> Scrutinizing R's underlying svd() code however, I know see that
> there are typicall still two other [n x p] matrices created (on
> in R's La.svd(), one in C code) ... which I think should be
> unnecessary in this case... but that would really be another
> topic (for R-devel , not R-help).
> 
> Martin
> 


Thanks.  It is easy enough to recode using SVN, and I think I will.It gives 
me a ;title more control on what the algorithm does.

-Roy



**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R studio kniter

2016-03-22 Thread Duncan Murdoch

On 22/03/2016 1:08 PM, Jan Kacaba wrote:

Hello, is it possible to run kiniter by script instead by clicking on
button compile PDF?

Say I have "texfile.rnw" and "myscript.R". I would like to knit texfile.rnw
by runnig script "myscript.R".
In "myscript.R" I would write something like this:
knit("texfile.rnw")


That exact command works.  It will produce a .tex file; if you want to 
continue on to produce a PDF, you need


knit2pdf("texfile.rnw")

instead.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R studio kniter

2016-03-22 Thread Jan Kacaba
Hello, is it possible to run kiniter by script instead by clicking on
button compile PDF?

Say I have "texfile.rnw" and "myscript.R". I would like to knit texfile.rnw
by runnig script "myscript.R".
In "myscript.R" I would write something like this:
knit("texfile.rnw")

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory usage in prcomp

2016-03-22 Thread Martin Maechler
> Roy Mendelssohn <- NOAA Federal >
> on Tue, 22 Mar 2016 07:42:10 -0700 writes:

> Hi All:
> I am running prcomp on a very large array, roughly [50, 3650].  The 
array itself is 16GB.  I am running on a Unix machine and am running “top” at 
the same time and am quite surprised to see that the application memory usage 
is 76GB.  I have the “tol” set very high  (.8) so that it should only pull out 
a few components.  I am surprised at this memory usage because prcomp uses the 
SVD if I am not mistaken, and when I take guesses at the size of the SVD 
matrices they shouldn’t be that large.   While I can fit this  in, for a 
variety of reasons I would like to reduce the memory footprint.  She questions:

> 1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save 
memory if I scaled the data first myself, saved the result, cleared out the 
workspace, read the scaled data back in and did the prcomp call?  Basically are 
the intermediate calculations for scaling kept in memory after use.

> 2. I don’t know how prcomp memory usage compares to a direct call to 
“svn” which allows me to explicitly set how many  singular vectors to compute 
(I only need like the first five at most).  prcomp is convenient because it 
does a lot of the other work for me

For your example, where p := ncol(x)  is 3650  but you only want
the first 5 PCs, it would be *considerably* more efficient to
use svd(..., nv = 5) directly.

So I would take  stats:::prcomp.default  and modify it
correspondingly.

This seems such a useful idea in general that I consider
updating the function in R with a new optional 'rank.'  argument which
you'd set to 5 in your case.

Scrutinizing R's underlying svd() code however, I know see that
there are typicall still two other [n x p] matrices created (on
in R's La.svd(), one in C code) ... which I think should be
unnecessary in this case... but that would really be another
topic (for R-devel , not R-help).

Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory usage in prcomp

2016-03-22 Thread Roy Mendelssohn - NOAA Federal
Hi All:

I am running prcomp on a very large array, roughly [50, 3650].  The array 
itself is 16GB.  I am running on a Unix machine and am running “top” at the 
same time and am quite surprised to see that the application memory usage is 
76GB.  I have the “tol” set very high  (.8) so that it should only pull out a 
few components.  I am surprised at this memory usage because prcomp uses the 
SVD if I am not mistaken, and when I take guesses at the size of the SVD 
matrices they shouldn’t be that large.   While I can fit this  in, for a 
variety of reasons I would like to reduce the memory footprint.  She questions:

1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save memory if 
I scaled the data first myself, saved the result, cleared out the workspace, 
read the scaled data back in and did the prcomp call?  Basically are the 
intermediate calculations for scaling kept in memory after use.

2. I don’t know how prcomp memory usage compares to a direct call to “svn” 
which allows me to explicitly set how many  singular vectors to compute (I only 
need like the first five at most).  prcomp is convenient because it does a lot 
of the other work for me


**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Roy Mendelssohn - NOAA Federal
Thanks all.  This is interesting, and for what I am doing worthwhile and 
helpful.  I have to be careful in each operation whether a copy is made or not, 
 and knowing this allows me to test on small examples what any command will do 
before I use,

Thanks again, I appreciate all the help.  I will have a related question, but 
will put it under a different heading.

-Roy
> On Mar 22, 2016, at 2:55 AM, Dénes Tóth  wrote:
> 
> 
> Hi Martin,
> 
> 
> On 03/22/2016 10:20 AM, Martin Maechler wrote:
>>> >Dénes Tóth
>>> > on Fri, 18 Mar 2016 22:56:23 +0100 writes:
>> > Hi Roy,
>> > R (usually) makes a copy if the dimensionality of an array is modified,
>> > even if you use this syntax:
>> 
>> > x <- array(1:24, c(2, 3, 4))
>> > dim(x) <- c(6, 4)
>> 
>> > See also ?tracemem, ?data.table::address, ?pryr::address and other 
>> tools
>> > to trace if an internal copy is done.
>> 
>> Well, without using strange (;-) packages,  indeed standard R's
>> tracemem(), notably the help page is a good pointer.
>> 
>> According to the help page memory tracing is enabled in the
>> default R binaries for Windows and OS X.
>> For Linux (where I, as R developer, compile R myself anyway),
>> one needs to configure with --enable-memory-profiling .
>> 
>> Now, let's try:
>> 
>>> x <- array(rnorm(47), dim = c(1000,50, 40))
>>> tracemem(x)
>>[1] "<0x7f79a498a010>"
>>> dim(x) <- c(1000* 50, 40)
>>> x[5] <- pi
>>> tracemem(x)
>>[1] "<0x7f79a498a010>"
>>>
>> 
>> So,*BOTH*   the re-dimensioning*AND*   the  sub-assignment did
>> *NOT*  make a copy.
> 
> This is interesting. First I wanted to demonstrate to Roy that recent R 
> versions are smart enough not to make any copy during reshaping an array. 
> Then I put together an example (similar to yours) and realized that after 
> several reshapes, R starts to copy the array. So I had to modify my 
> suggestion... And now, I realized that this was an RStudio-issue. At least on 
> Linux, a standard R terminal behaves as you described, however, RStudio 
> (version 0.99.862, which is not the very latest) tends to create copies 
> (quite randomly, at least to me). If I have time I will test this more 
> thoroughly and file a report to RStudio if it turns out to be a bug.
> 
> Denes
> 
>> 
>> Indeed, R has become much smarter  in these things in recent
>> years ... not thanks to me, but very much thanks to
>> Luke Tierney (from R-core), and also thanks to contributions from "outside",
>> notably Tomas Kalibera.
>> 
>> And hence:*NO*  such strange workarounds are needed in this specific case:
>> 
>> > Workaround: use data.table::setattr or bit::setattr to modify the
>> > dimensions in place (i.e., without making a copy). Risk: if you modify
>> > an object by reference, all other objects which point to the same 
>> memory
>> > address will be modified silently, too.
>> 
>> Martin Maechler, ETH Zurich  (and R-core)
>> 
>> > HTH,
>> > Denes
>> 
>> (generally, your contributions help indeed, Denes, thank you!)
>> 
>> 
>> > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>> >> Hi All:
>> >>
>> >> I am working with a very large array.  if noLat is the number of 
>> latitudes, noLon the number of longitudes and noTime the number of  time 
>> periods, the array is of the form:
>> >>
>> >> myData[noLat, no Lon, noTime].
>> >>
>> >> It is read in this way because that is how it is stored in a (series) 
>> of netcdf files.  For the analysis I need to do, I need instead the array:
>> >>
>> >> myData[noLat*noLon, noTime].  Normally this would be easy:
>> >>
>> >> myData<- array(myData,dim=c(noLat*noLon,noTime))
>> >>
>> >> My question is how does this command work in R - does it make a copy 
>> of the existing array, with different indices for the dimensions, or does it 
>> just redo the indices and leave the given array as is?  The reason for this 
>> question is my array is 30GB in memory, and I don’t have enough space to 
>> have a copy of the array in memory.  If the latter I will have to figure out 
>> a work around to bring in only part of the data at a time and put it into 
>> the proper locations.
>> >>
>> >> Thanks,
>> >>
>> >> -Roy

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.


Re: [R] Is there dpois equivalent for zero-inflated Poisson?

2016-03-22 Thread Thierry Onkelinx
dpois(0, lambda) == e^(-lambda)

The wikipedia formula is
ifelse(x == 0, zero + dpois(0, lambda) * (1-zero), dpois(x, lambda) *
(1-zero))

or

ifelse(x == 0, zero + dpois(x, lambda) * (1-zero), dpois(x, lambda) *
(1-zero))

so we can move the dpois() out of the ifelse()

ifelse(x == 0, zero, 0)  + dpois(x, lambda) * (1-zero)



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-03-22 13:50 GMT+01:00 Matti Viljamaa :

> And why is the first term of ifelse(x == 0, zero, 0) + dpois(x, lambda) /
> (1 - zero)
>
> ifelse(x == 0, zero, 0)
>
> rather than something corresponding to
>
> zero+(1-zero)e^{-lambda}
>
> https://en.wikipedia.org/wiki/Zero-inflated_model#Zero-inflated_Poisson
>
> On 22 Mar 2016, at 14:25, Matti Viljamaa  wrote:
>
> Could you clarify what are the parameters and why it’s formulated that way?
>
> -Matti
>
> On 22 Mar 2016, at 14:17, Thierry Onkelinx 
> wrote:
>
> Dear Matti,
>
> What about this?
>
> dzeroinflpois <- function(x, lambda, zero){
>   ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - zero)
> }
> plot(x, dzeroinflpois(x, lambda = 10, zero = 0.2), type = "l")
>
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2016-03-22 13:04 GMT+01:00 Matti Viljamaa :
>
>> I’m doing some optimisation that I first did with normal Poisson (only
>> parameter theta was estimated), but now I’m doing the same with a
>> zero-inflated Poisson model which
>> gives me two estimated parameters theta and p (p is also pi in some
>> notation).
>>
>> My question is, is there something equivalent to dpois that would use
>> both of the parameters (or is the p parameter possibly unnecessary)?
>>
>> I’m calculating the “fit” of the Poisson model
>>
>> i.e. like
>>
>> x = c(0,1,2,3,4,5,6)
>> y = c(3062,587,284,103,33,4,2)
>> fit1 <- sum(y)*dpois(x, est_theta)
>>
>> and then comparing fit1 to the real observations.
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there dpois equivalent for zero-inflated Poisson?

2016-03-22 Thread Matti Viljamaa
And why is the first term of ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - 
zero)

ifelse(x == 0, zero, 0)

rather than something corresponding to

zero+(1-zero)e^{-lambda}

https://en.wikipedia.org/wiki/Zero-inflated_model#Zero-inflated_Poisson

> On 22 Mar 2016, at 14:25, Matti Viljamaa  wrote:
> 
> Could you clarify what are the parameters and why it’s formulated that way?
> 
> -Matti
> 
>> On 22 Mar 2016, at 14:17, Thierry Onkelinx > > wrote:
>> 
>> Dear Matti,
>> 
>> What about this?
>> 
>> dzeroinflpois <- function(x, lambda, zero){
>>   ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - zero)
>> }
>> plot(x, dzeroinflpois(x, lambda = 10, zero = 0.2), type = "l")
>> 
>> 
>> 
>> ir. Thierry Onkelinx
>> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
>> Forest 
>> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance 
>> Kliniekstraat 25
>> 1070 Anderlecht
>> Belgium
>> 
>> To call in the statistician after the experiment is done may be no more than 
>> asking him to perform a post-mortem examination: he may be able to say what 
>> the experiment died of. ~ Sir Ronald Aylmer Fisher
>> The plural of anecdote is not data. ~ Roger Brinner 
>> The combination of some data and an aching desire for an answer does not 
>> ensure that a reasonable answer can be extracted from a given body of data. 
>> ~ John Tukey
>> 
>> 2016-03-22 13:04 GMT+01:00 Matti Viljamaa > >:
>> I’m doing some optimisation that I first did with normal Poisson (only 
>> parameter theta was estimated), but now I’m doing the same with a 
>> zero-inflated Poisson model which
>> gives me two estimated parameters theta and p (p is also pi in some 
>> notation).
>> 
>> My question is, is there something equivalent to dpois that would use both 
>> of the parameters (or is the p parameter possibly unnecessary)?
>> 
>> I’m calculating the “fit” of the Poisson model
>> 
>> i.e. like
>> 
>> x = c(0,1,2,3,4,5,6)
>> y = c(3062,587,284,103,33,4,2)
>> fit1 <- sum(y)*dpois(x, est_theta)
>> 
>> and then comparing fit1 to the real observations.
>> [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org  mailing list -- To 
>> UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help 
>> 
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there dpois equivalent for zero-inflated Poisson?

2016-03-22 Thread Thierry Onkelinx
zero = proportion of zero inflation part
lamba = expected value of poisson part

There was a typo in the distribution. It should multiple by (1 - zero)
instead of divide by it.

dzeroinflpois <- function(x, lambda, zero){
  ifelse(x == 0, zero, 0) + dpois(x, lambda) * (1 - zero)
}

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-03-22 13:25 GMT+01:00 Matti Viljamaa :

> Could you clarify what are the parameters and why it’s formulated that way?
>
> -Matti
>
> On 22 Mar 2016, at 14:17, Thierry Onkelinx 
> wrote:
>
> Dear Matti,
>
> What about this?
>
> dzeroinflpois <- function(x, lambda, zero){
>   ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - zero)
> }
> plot(x, dzeroinflpois(x, lambda = 10, zero = 0.2), type = "l")
>
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2016-03-22 13:04 GMT+01:00 Matti Viljamaa :
>
>> I’m doing some optimisation that I first did with normal Poisson (only
>> parameter theta was estimated), but now I’m doing the same with a
>> zero-inflated Poisson model which
>> gives me two estimated parameters theta and p (p is also pi in some
>> notation).
>>
>> My question is, is there something equivalent to dpois that would use
>> both of the parameters (or is the p parameter possibly unnecessary)?
>>
>> I’m calculating the “fit” of the Poisson model
>>
>> i.e. like
>>
>> x = c(0,1,2,3,4,5,6)
>> y = c(3062,587,284,103,33,4,2)
>> fit1 <- sum(y)*dpois(x, est_theta)
>>
>> and then comparing fit1 to the real observations.
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> 
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there dpois equivalent for zero-inflated Poisson?

2016-03-22 Thread Matti Viljamaa
Could you clarify what are the parameters and why it’s formulated that way?

-Matti

> On 22 Mar 2016, at 14:17, Thierry Onkelinx  wrote:
> 
> Dear Matti,
> 
> What about this?
> 
> dzeroinflpois <- function(x, lambda, zero){
>   ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - zero)
> }
> plot(x, dzeroinflpois(x, lambda = 10, zero = 0.2), type = "l")
> 
> 
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
> Forest 
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance 
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> 
> To call in the statistician after the experiment is done may be no more than 
> asking him to perform a post-mortem examination: he may be able to say what 
> the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner 
> The combination of some data and an aching desire for an answer does not 
> ensure that a reasonable answer can be extracted from a given body of data. ~ 
> John Tukey
> 
> 2016-03-22 13:04 GMT+01:00 Matti Viljamaa  >:
> I’m doing some optimisation that I first did with normal Poisson (only 
> parameter theta was estimated), but now I’m doing the same with a 
> zero-inflated Poisson model which
> gives me two estimated parameters theta and p (p is also pi in some notation).
> 
> My question is, is there something equivalent to dpois that would use both of 
> the parameters (or is the p parameter possibly unnecessary)?
> 
> I’m calculating the “fit” of the Poisson model
> 
> i.e. like
> 
> x = c(0,1,2,3,4,5,6)
> y = c(3062,587,284,103,33,4,2)
> fit1 <- sum(y)*dpois(x, est_theta)
> 
> and then comparing fit1 to the real observations.
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org  mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help 
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> 
> and provide commented, minimal, self-contained, reproducible code.
> 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there dpois equivalent for zero-inflated Poisson?

2016-03-22 Thread Thierry Onkelinx
Dear Matti,

What about this?

dzeroinflpois <- function(x, lambda, zero){
  ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - zero)
}
plot(x, dzeroinflpois(x, lambda = 10, zero = 0.2), type = "l")



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2016-03-22 13:04 GMT+01:00 Matti Viljamaa :

> I’m doing some optimisation that I first did with normal Poisson (only
> parameter theta was estimated), but now I’m doing the same with a
> zero-inflated Poisson model which
> gives me two estimated parameters theta and p (p is also pi in some
> notation).
>
> My question is, is there something equivalent to dpois that would use both
> of the parameters (or is the p parameter possibly unnecessary)?
>
> I’m calculating the “fit” of the Poisson model
>
> i.e. like
>
> x = c(0,1,2,3,4,5,6)
> y = c(3062,587,284,103,33,4,2)
> fit1 <- sum(y)*dpois(x, est_theta)
>
> and then comparing fit1 to the real observations.
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Is there dpois equivalent for zero-inflated Poisson?

2016-03-22 Thread Matti Viljamaa
I’m doing some optimisation that I first did with normal Poisson (only 
parameter theta was estimated), but now I’m doing the same with a zero-inflated 
Poisson model which
gives me two estimated parameters theta and p (p is also pi in some notation).

My question is, is there something equivalent to dpois that would use both of 
the parameters (or is the p parameter possibly unnecessary)?

I’m calculating the “fit” of the Poisson model

i.e. like

x = c(0,1,2,3,4,5,6)
y = c(3062,587,284,103,33,4,2)
fit1 <- sum(y)*dpois(x, est_theta)

and then comparing fit1 to the real observations.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regex - extracting src url

2016-03-22 Thread Martin Morgan



On 03/22/2016 12:44 AM, Omar André Gonzáles Díaz wrote:

Hi,I have a DF with a column with "html", like this:

https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=?;
BORDER="0" HEIGHT="1" WIDTH="1" ALT="Advertisement">


I need to get this:


https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=
?


I've got this so far:


https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=?\;
BORDER=\"0\" HEIGHT=\"1\" WIDTH=\"1\" ALT=\"Advertisement


With this is the code I've used:

carreras_normal$Impression.Tag..image. <-
gsub("","\\1",carreras_normal$Impression.Tag..image.,
   ignore.case = T)



*But I still need to use get rid of this part:*


https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=
?*\" BORDER=\"0\" HEIGHT=\"1\" WIDTH=\"1\" ALT=\"Advertisement*


Thank you for your help.


You're querying an xml string, so use xpath, e.g., via the XML library

> as.character(xmlParse(y)[["//IMG/@SRC"]])
[1] 
"https://ad.doubleclick.net/ddm/trackimp/N344006.1960500FACEBOOKAD/B9589414.130145906;dc_trk_aid=303019819;dc_trk_cid=69763238;ord=[timestamp];dc_lat=;dc_rdid=;tag_for_child_directed_treatment=?;


`xmlParse()` translates the character string into  an XML document. `[[` 
subsets the document to extract a single element. "//IMG/@SRC" follows 
the xpath specification (this section 
https://www.w3.org/TR/xpath-31/#abbrev of the specification provides a 
quick guide) to find, starting from the 'root' of the document, a node, 
at any depth, labeled IMG containing an attribute labeled SRC.


A variation, if there were several IMG tags to be extracted, would be

  xpathSApply(xmlParse(y), "//IMG/@SRC", as.character)



Omar Gonzáles.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Dénes Tóth


Hi Martin,


On 03/22/2016 10:20 AM, Martin Maechler wrote:

>Dénes Tóth
> on Fri, 18 Mar 2016 22:56:23 +0100 writes:

 > Hi Roy,
 > R (usually) makes a copy if the dimensionality of an array is modified,
 > even if you use this syntax:

 > x <- array(1:24, c(2, 3, 4))
 > dim(x) <- c(6, 4)

 > See also ?tracemem, ?data.table::address, ?pryr::address and other tools
 > to trace if an internal copy is done.

Well, without using strange (;-) packages,  indeed standard R's
tracemem(), notably the help page is a good pointer.

According to the help page memory tracing is enabled in the
default R binaries for Windows and OS X.
For Linux (where I, as R developer, compile R myself anyway),
one needs to configure with --enable-memory-profiling .

Now, let's try:

> x <- array(rnorm(47), dim = c(1000,50, 40))
> tracemem(x)
[1] "<0x7f79a498a010>"
> dim(x) <- c(1000* 50, 40)
> x[5] <- pi
> tracemem(x)
[1] "<0x7f79a498a010>"
>

So,*BOTH*   the re-dimensioning*AND*   the  sub-assignment did
*NOT*  make a copy.


This is interesting. First I wanted to demonstrate to Roy that recent R 
versions are smart enough not to make any copy during reshaping an 
array. Then I put together an example (similar to yours) and realized 
that after several reshapes, R starts to copy the array. So I had to 
modify my suggestion... And now, I realized that this was an 
RStudio-issue. At least on Linux, a standard R terminal behaves as you 
described, however, RStudio (version 0.99.862, which is not the very 
latest) tends to create copies (quite randomly, at least to me). If I 
have time I will test this more thoroughly and file a report to RStudio 
if it turns out to be a bug.


Denes



Indeed, R has become much smarter  in these things in recent
years ... not thanks to me, but very much thanks to
Luke Tierney (from R-core), and also thanks to contributions from "outside",
notably Tomas Kalibera.

And hence:*NO*  such strange workarounds are needed in this specific case:

 > Workaround: use data.table::setattr or bit::setattr to modify the
 > dimensions in place (i.e., without making a copy). Risk: if you modify
 > an object by reference, all other objects which point to the same memory
 > address will be modified silently, too.

Martin Maechler, ETH Zurich  (and R-core)

 > HTH,
 > Denes

(generally, your contributions help indeed, Denes, thank you!)


 > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
 >> Hi All:
 >>
 >> I am working with a very large array.  if noLat is the number of 
latitudes, noLon the number of longitudes and noTime the number of  time periods, the 
array is of the form:
 >>
 >> myData[noLat, no Lon, noTime].
 >>
 >> It is read in this way because that is how it is stored in a (series) 
of netcdf files.  For the analysis I need to do, I need instead the array:
 >>
 >> myData[noLat*noLon, noTime].  Normally this would be easy:
 >>
 >> myData<- array(myData,dim=c(noLat*noLon,noTime))
 >>
 >> My question is how does this command work in R - does it make a copy of 
the existing array, with different indices for the dimensions, or does it just redo 
the indices and leave the given array as is?  The reason for this question is my 
array is 30GB in memory, and I don’t have enough space to have a copy of the array in 
memory.  If the latter I will have to figure out a work around to bring in only part 
of the data at a time and put it into the proper locations.
 >>
 >> Thanks,
 >>
 >> -Roy



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Martin Maechler
> Dénes Tóth 
> on Fri, 18 Mar 2016 22:56:23 +0100 writes:

> Hi Roy,
> R (usually) makes a copy if the dimensionality of an array is modified, 
> even if you use this syntax:

> x <- array(1:24, c(2, 3, 4))
> dim(x) <- c(6, 4)

> See also ?tracemem, ?data.table::address, ?pryr::address and other tools 
> to trace if an internal copy is done.

Well, without using strange (;-) packages,  indeed standard R's
tracemem(), notably the help page is a good pointer.

According to the help page memory tracing is enabled in the
default R binaries for Windows and OS X.
For Linux (where I, as R developer, compile R myself anyway),
one needs to configure with --enable-memory-profiling .

Now, let's try:

   > x <- array(rnorm(47), dim = c(1000,50, 40))
   > tracemem(x)
   [1] "<0x7f79a498a010>"
   > dim(x) <- c(1000* 50, 40)
   > x[5] <- pi
   > tracemem(x)
   [1] "<0x7f79a498a010>"
   > 

So, *BOTH*  the re-dimensioning  *AND*  the  sub-assignment did
*NOT* make a copy.

Indeed, R has become much smarter  in these things in recent
years ... not thanks to me, but very much thanks to
Luke Tierney (from R-core), and also thanks to contributions from "outside",
notably Tomas Kalibera.

And hence: *NO* such strange workarounds are needed in this specific case: 

> Workaround: use data.table::setattr or bit::setattr to modify the 
> dimensions in place (i.e., without making a copy). Risk: if you modify 
> an object by reference, all other objects which point to the same memory 
> address will be modified silently, too.

Martin Maechler, ETH Zurich  (and R-core)

> HTH,
> Denes

(generally, your contributions help indeed, Denes, thank you!)


> On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
>> Hi All:
>> 
>> I am working with a very large array.  if noLat is the number of 
latitudes, noLon the number of longitudes and noTime the number of  time 
periods, the array is of the form:
>> 
>> myData[noLat, no Lon, noTime].
>> 
>> It is read in this way because that is how it is stored in a (series) of 
netcdf files.  For the analysis I need to do, I need instead the array:
>> 
>> myData[noLat*noLon, noTime].  Normally this would be easy:
>> 
>> myData<- array(myData,dim=c(noLat*noLon,noTime))
>> 
>> My question is how does this command work in R - does it make a copy of 
the existing array, with different indices for the dimensions, or does it just 
redo the indices and leave the given array as is?  The reason for this question 
is my array is 30GB in memory, and I don’t have enough space to have a copy of 
the array in memory.  If the latter I will have to figure out a work around to 
bring in only part of the data at a time and put it into the proper locations.
>> 
>> Thanks,
>> 
>> -Roy

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.