date:20160511

[R] To compare and filter text (mining data)

2016-05-11 Thread Marcelo Laia

Hi, I have a experiment like this:

Trat Rep Peak CAS
11   1123-92-2
11   2109-21-7
11   32867-05-2
11   ...  ...
11   33   99-86-5
12   1562-74-3
12   2123-92-2
12   3109-21-7
12   ...  ...
12   45   2867-05-2
...
14   3   18   2867-05-2

Trat = Treatment - range from 1 to 14
Rep = Biological Replicate - range from 1 to 3
Peak = Peak from GC/MS chromatogram - range from 1 to n (n>1)
CAS = oil CAS Number [1]

I would like to compare all 14 treatments (3 replicates) and print only Trat
and Rep and Peak that have exclusive CAS, and the CAS number, off course. In
fact, I would like to know if there are exclusive CAS in a specific 
treatment. 

Is it possible to do it inside R?

Could you share a code ou paper ou tutorial to do that? Or point me out a
R package/library?

Thank you very much!

1. https://www.cas.org/content/chemical-substances/faqs

-- 
Marcelo

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R-es] Aplicar una función repetidamente

2016-05-11 Thread Jorge I Velez

Hola a todos,

Quisiera aplicar una función f(x) un total de k veces de manera recursiva.
En pseudo código sería algo como

Si k = 1, calcular f(x);
Si k = 2, calcular f(f(x));
Si k = 3, calcular f(f(f(x))).

Al final me gustaria tener una función g cuyos argumentos sean x y el valor
de k. Así,

g(x, k = 2)

daría como resultado f(f(x)).

Cualquier ayuda y/o sugerencia será más que bienvenida.

Muchísimas gracias,
Jorge Velez.-

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] R simulation help pls

2016-05-11 Thread Michael Friendly


On 4/06/16 11:54 AM, tan sj wrote:


Hi, i am student from malaysia, i am new in r programming field, now i am 
trying to conduct a robustness study on 2 sample test under several combination 
of factors such as sample sizes ,standard deviation ratio and  also 
distribution..

but now i am stucking in how to use for loop or apply function to conduct the 
simulation ?
Then how can i test the test in the combined combination of factors?

Look for the SimDesign package.  Makes this easy to do.  No loops, no 
pain.  There are some good examples on the wiki for this.


https://github.com/philchalmers/SimDesign
https://github.com/philchalmers/SimDesign/wiki

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R_DirtyImage and Rprof

2016-05-11 Thread Benjamin Tyner


Hello,

I have some code which was running in interactive mode while Rprof(..., 
line.profiling = TRUE). Near the end of my script, it opens up a 
pipe(..., open = "w") to a perl script, and at that point the execution 
gets stuck using 100% cpu.


(The perl script itself never showed up in pstree, as far as I can tell).

I did a "tail -f" on the file being written to by Rprof, and it was 
reporting "sys.save.image" over and over, and in fact an ".RData" file 
appeared when I had not asked for one, and I was able to load it later.


This got me curious, as nowhere in my code do I directly use that 
function. Looking through the source code for R, it appears that 
"sys.save.image" is called whenever an R_DirtyImage condition is triggered.


This was using R version 3.2.2 under RHEL. My efforts to create a 
reproducible example of this behavior have thus far been unsuccessful.


My questions: is there any documentation for R_DirtyImage, and how 
plausible is it that the R_DirtyImage condition was triggered by 
something Rprof did? The reason for my conjecture is that 
sys.save.image() calls closeAllConnections(), which I imagine might have 
interfered with the pipe that was open for writing, thus causing the 
stuck execution at that point.


If so, any advice for avoiding the R_DirtyImage condition while profiling?

If not, any conjectures for what might actually be going on? For what 
it's worth, I have observed a similar situation when using Rprof + 
system() instead of pipe(); for example:


   https://stat.ethz.ch/pipermail/r-help/2015-August/431286.html

Regards
Ben

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] physical constraint with gam

2016-05-11 Thread Dominik Schneider

Hi again,
I'm looking for some clarification on 2 things.
1. On that last note, I realize that s(x1,x2) would be the other obvious
interaction to compare with - and I see that you recommend te(x1,x2) if
they are not on the same scale.
2. If s(x1,by=x1) gives you a "parameter" value similar to a GLM when you
plot s(x1):x1, why does my function above return the same yhat as
predict(mdl,type='response') ?  Shouldn't each of the terms need to be
multiplied by the variable value before applying
rowSums()+attr(sterms,'constant') ??
Thanks again
Dominik

On Wed, May 11, 2016 at 10:11 AM, Dominik Schneider <
dominik.schnei...@colorado.edu> wrote:

> Hi Simon, Thanks for this explanation.
> To make sure I understand, another way of explaining the y axis in my
> original example is that it is the contribution to snowdepth relative to
> the other variables (the example only had fsca, but my actual case has a
> couple others). i.e. a negative s(fsca) of -0.5 simply means snowdepth 0.5
> units below the intercept+s(x_i), where s(x_i) could also be negative in
> the case where total snowdepth is less than the intercept value.
>
> The use of by=fsca is really useful for interpreting the marginal impact
> of the different variables. With my actual data, the term s(fsca):fsca is
> never negative, which is much more intuitive. Is it appropriate to compare
> magnitudes of e.g. s(x2):x2 / mean(x2) and s(x2):x2 / mean(x2)  where
> mean(x_i) are the mean of the actual data?
>
> Lastly, how would these two differ: s(x1,by=x2); or
> s(x1,by=x1)*s(x2,by=x2) since interactions are surely present and i'm not
> sure if a linear combination is enough.
>
> Thanks!
> Dominik
>
>
> On Wed, May 11, 2016 at 3:11 AM, Simon Wood  wrote:
>
>> The spline having a positive value is not the same as a glm coefficient
>> having a positive value. When you plot a smooth, say s(x), that is
>> equivalent to plotting the line 'beta * x' in a GLM. It is not equivalent
>> to plotting 'beta'. The smooths in a gam are (usually) subject to
>> `sum-to-zero' identifiability constraints to avoid confounding via the
>> intercept, so they are bound to be negative over some part of the covariate
>> range. For example, if I have a model y ~ s(x) + s(z), I can't estimate the
>> mean level for s(x) and the mean level for s(z) as they are completely
>> confounded, and confounded with the model intercept term.
>>
>> I suppose that if you want to interpret the smooths as glm parameters
>> varying with the covariate they relate to then you can do, by setting the
>> model up as a varying coefficient model, using the `by' argument to 's'...
>>
>> gam(snowdepth~s(fsca,by=fsca),data=dat)
>>
>>
>> this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' . s(fsca,by=fsca)
>> is not confounded with the intercept, so no constraint is needed or
>> applied, and you can now interpret the smooth like a local GLM coefficient.
>>
>> best,
>> Simon
>>
>>
>>
>>
>> On 11/05/16 01:30, Dominik Schneider wrote:
>>
>>> Hi,
>>> Just getting into using GAM using the mgcv package. I've generated some
>>> models and extracted the splines for each of the variables and started
>>> visualizing them. I'm noticing that one of my variables is physically
>>> unrealistic.
>>>
>>> In the example below, my interpretation of the following plot is that the
>>> y-axis is basically the equivalent of a "parameter" value of a GLM; in
>>> GAM
>>> this value can change as the functional relationship changes between x
>>> and
>>> y. In my case, I am predicting snowdepth based on the fractional snow
>>> covered area. In no case will snowdepth realistically decrease for a unit
>>> increase in fsca so my question is: *Is there a way to constrain the
>>> spline
>>> to positive values? *
>>>
>>> Thanks
>>> Dominik
>>>
>>> library(mgcv)
>>> library(dplyr)
>>> library(ggplot2)
>>> extract_splines=function(mdl){
>>>sterms=predict(mdl,type='terms')
>>>datplot=cbind(sterms,mdl$model) %>% tbl_df
>>>datplot$intercept=attr(sterms,'constant')
>>>datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
>>>return(datplot)
>>> }
>>> dat=data_frame(snowdepth=runif(100,min =
>>> 0.001,max=6.7),fsca=runif(100,0.01,.99))
>>> mdl=gam(snowdepth~s(fsca),data=dat)
>>> termdF=extract_splines(mdl)
>>> ggplot(termdF)+
>>>geom_line(aes(x=fsca,y=`s(fsca)`))
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
>> +44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org

Re: [R] break string at specified possitions

2016-05-11 Thread Daniel Nordlund


On 5/11/2016 2:23 PM, Jan Kacaba wrote:

Here is my attempt at function which computes margins from positions.

require("stringr")
require("dplyr")

ends<-seq(10,100,8)  # end margins
test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Aliquam in lorem sit amet leo accumsan lacinia."

sekoj=function(ends){
  l_ends<-length(ends)
  begs=vector(mode="integer",l_ends)
  begs[1]=1
  for (i in 2:(l_ends)){
begs[i]<-ends[i-1]+1
  }
  margs<-rbind(begs,ends)
  margs<-cbind(margs,c(ends[l_ends]+1,-1))
  #rownames(margs)<-c("beg","end")
  return(margs)
}
margins<-sekoj(ends)
str_sub(test_string,margins[1,],margins[2,]) %>% print

Code to run in browser:
http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV

2016-05-11 23:12 GMT+02:00 Bert Gunter :

Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
https://cran.r-project.org/web/packages/stringr/stringr.pdf

Cheers,

Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba  wrote:

Dear R-help

I would like to split long string at specified precomputed positions.
'substring' needs beginings and ends. Is there a native function which
accepts positions so I don't have to count second argument?

For example I have vector of possitions pos<-c(5,10,19). Substring
needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
to write my own function. Just asking.

Derek



I think you can simply this. just create a function (I'll call it begs) 
to compute the beginning positions.


begs <- function(x) c(0,x[-length(x)])+1

Then, then use that function in your call to str_sub

str_sub(test_string,begs(ends),ends) %>% print


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] break string at specified possitions

2016-05-11 Thread Jim Lemon

Hi again,
Sorry, that should be:

chop_string<-function(x,ends) {
 starts<-c(1,ends[-length(ends)]+1)
 return(substring(x,starts,ends))
}

Jim

On Thu, May 12, 2016 at 10:05 AM, Jim Lemon  wrote:
> Hi Jan,
> This might be helpful:
>
> chop_string<-function(x,ends) {
>  starts<-c(1,ends[-length(ends)]-1)
>  return(substring(x,starts,ends))
> }
>
> Jim
>
>
> On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba  wrote:
>> Here is my attempt at function which computes margins from positions.
>>
>> require("stringr")
>> require("dplyr")
>>
>> ends<-seq(10,100,8)  # end margins
>> test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
>> elit. Aliquam in lorem sit amet leo accumsan lacinia."
>>
>> sekoj=function(ends){
>>   l_ends<-length(ends)
>>   begs=vector(mode="integer",l_ends)
>>   begs[1]=1
>>   for (i in 2:(l_ends)){
>> begs[i]<-ends[i-1]+1
>>   }
>>   margs<-rbind(begs,ends)
>>   margs<-cbind(margs,c(ends[l_ends]+1,-1))
>>   #rownames(margs)<-c("beg","end")
>>   return(margs)
>> }
>> margins<-sekoj(ends)
>> str_sub(test_string,margins[1,],margins[2,]) %>% print
>>
>> Code to run in browser:
>> http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV
>>
>> 2016-05-11 23:12 GMT+02:00 Bert Gunter :
>>> Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
>>> https://cran.r-project.org/web/packages/stringr/stringr.pdf
>>>
>>> Cheers,
>>>
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba  wrote:
 Dear R-help

 I would like to split long string at specified precomputed positions.
 'substring' needs beginings and ends. Is there a native function which
 accepts positions so I don't have to count second argument?

 For example I have vector of possitions pos<-c(5,10,19). Substring
 needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
 to write my own function. Just asking.

 Derek

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] break string at specified possitions

2016-05-11 Thread Jim Lemon

Hi Jan,
This might be helpful:

chop_string<-function(x,ends) {
 starts<-c(1,ends[-length(ends)]-1)
 return(substring(x,starts,ends))
}

Jim


On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba  wrote:
> Here is my attempt at function which computes margins from positions.
>
> require("stringr")
> require("dplyr")
>
> ends<-seq(10,100,8)  # end margins
> test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
> elit. Aliquam in lorem sit amet leo accumsan lacinia."
>
> sekoj=function(ends){
>   l_ends<-length(ends)
>   begs=vector(mode="integer",l_ends)
>   begs[1]=1
>   for (i in 2:(l_ends)){
> begs[i]<-ends[i-1]+1
>   }
>   margs<-rbind(begs,ends)
>   margs<-cbind(margs,c(ends[l_ends]+1,-1))
>   #rownames(margs)<-c("beg","end")
>   return(margs)
> }
> margins<-sekoj(ends)
> str_sub(test_string,margins[1,],margins[2,]) %>% print
>
> Code to run in browser:
> http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV
>
> 2016-05-11 23:12 GMT+02:00 Bert Gunter :
>> Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
>> https://cran.r-project.org/web/packages/stringr/stringr.pdf
>>
>> Cheers,
>>
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba  wrote:
>>> Dear R-help
>>>
>>> I would like to split long string at specified precomputed positions.
>>> 'substring' needs beginings and ends. Is there a native function which
>>> accepts positions so I don't have to count second argument?
>>>
>>> For example I have vector of possitions pos<-c(5,10,19). Substring
>>> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
>>> to write my own function. Just asking.
>>>
>>> Derek
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] break string at specified possitions

2016-05-11 Thread Jan Kacaba

Here is my attempt at function which computes margins from positions.

require("stringr")
require("dplyr")

ends<-seq(10,100,8)  # end margins
test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Aliquam in lorem sit amet leo accumsan lacinia."

sekoj=function(ends){
  l_ends<-length(ends)
  begs=vector(mode="integer",l_ends)
  begs[1]=1
  for (i in 2:(l_ends)){
begs[i]<-ends[i-1]+1
  }
  margs<-rbind(begs,ends)
  margs<-cbind(margs,c(ends[l_ends]+1,-1))
  #rownames(margs)<-c("beg","end")
  return(margs)
}
margins<-sekoj(ends)
str_sub(test_string,margins[1,],margins[2,]) %>% print

Code to run in browser:
http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV

2016-05-11 23:12 GMT+02:00 Bert Gunter :
> Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
> https://cran.r-project.org/web/packages/stringr/stringr.pdf
>
> Cheers,
>
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba  wrote:
>> Dear R-help
>>
>> I would like to split long string at specified precomputed positions.
>> 'substring' needs beginings and ends. Is there a native function which
>> accepts positions so I don't have to count second argument?
>>
>> For example I have vector of possitions pos<-c(5,10,19). Substring
>> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
>> to write my own function. Just asking.
>>
>> Derek
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] break string at specified possitions

2016-05-11 Thread Bert Gunter

Dunno -- but you might have a look at Hadley Wickham's 'stringr' package:
https://cran.r-project.org/web/packages/stringr/stringr.pdf

Cheers,

Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba  wrote:
> Dear R-help
>
> I would like to split long string at specified precomputed positions.
> 'substring' needs beginings and ends. Is there a native function which
> accepts positions so I don't have to count second argument?
>
> For example I have vector of possitions pos<-c(5,10,19). Substring
> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
> to write my own function. Just asking.
>
> Derek
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] break string at specified possitions

2016-05-11 Thread Jan Kacaba

Dear R-help

I would like to split long string at specified precomputed positions.
'substring' needs beginings and ends. Is there a native function which
accepts positions so I don't have to count second argument?

For example I have vector of possitions pos<-c(5,10,19). Substring
needs input first=c(1,6,11) and last=c(5,10,19). There is no problem
to write my own function. Just asking.

Derek

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-11 Thread Henrik Bengtsson

Sounds like it would be helpful to find out exactly which process is
holding on to the file in order to figure out what's going on. From a
quick look, it seems that

  
http://superuser.com/questions/117902/find-out-which-process-is-locking-a-file-or-folder-in-windows

gives some useful info on how to track down the process that looks the file.

/Henrik

On Wed, May 11, 2016 at 9:47 AM,   wrote:
> Duncan,
>
> thanks for the hint.
>
> I have done it correctly in R fashion
>
> ## capture all the output to a file.
> zz <- file("C:/Temp/all.Rout", open = "wt")
> sink(zz)
> sink(zz, type = "message")
> try(log("a"))
> ## back to the console
> sink(type = "message")
> sink()
> unlink("C:/Temp/all.Rout")
>
> But the error persits.
>
> Kind regards
>
> Georg
>
>
>
>
> Von:Duncan Murdoch 
> An: John Sorkin , drjimle...@gmail.com,
> g.maub...@weinwolf.de,
> Kopie:  r-help@r-project.org
> Datum:  10.05.2016 19:03
> Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file
>
>
>
> On 10/05/2016 11:15 AM, John Sorkin wrote:
>> George,
>> I do not know what operating system you are working with, but when I use
> sink() under windows, I need to specify a valid path which I don't see in
> your code. I might, for example specify:
>>
>> sink("c:\myfile.txt")
>
> Note that the backslash should be doubled (so it isn't interpreted as an
> escape for the "m" that follows it), or replaced with a forward slash.
>
> Duncan Murdoch
>
>>   R code goes here
>> sink()
>>
>> with the expectation that I would create a file myfile.txt that would
> contain the output of my R program.
>>
>> John
>>
>>
>> John David Sorkin M.D., Ph.D.
>> Professor of Medicine
>> Chief, Biostatistics and Informatics
>> University of Maryland School of Medicine Division of Gerontology and
> Geriatric Medicine
>> Baltimore VA Medical Center
>> 10 North Greene Street
>> GRECC (BT/18/GR)
>> Baltimore, MD 21201-1524
>> (Phone) 410-605-7119
>> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
>> >>>  05/10/16 11:10 AM >>>
>> Hi Jim,
>>
>> I tried:
>>
>> sink("all.Rout")
>> try(log("a"))
>> sink()
>>
>> The program executes without warning or error. The file "all.Rout" is
>> begin created. Nothing will be written to it. The file is accessable
>> rights after the execution of the program by notepad.exe.
>>
>> The program
>>
>> zz <- file("all.Rout", open = "wt")
>> sink(zz, type = "message")
>> try(log("a"))
>> sink()
>> close(zz)
>> unlink(zz)
>>
>> creates the file, does not write anything to it and is not accessable
>> after program execution in R with notepad.exe.
>>
>> Any ideas what happens behind the szenes?
>>
>> Kind regards
>>
>> Georg
>>
>>
>>
>>
>> Von: Jim Lemon 
>> An: g.maub...@weinwolf.de,
>> Kopie: r-help mailing list 
>> Datum: 10.05.2016 13:16
>> Betreff: Re: Re: [R] sink(): Cannot open file
>>
>>
>>
>> Have you tried:
>>
>> sink("all.Rout")
>> try(log("a"))
>> sink()
>>
>> Jim
>>
>> On Tue, May 10, 2016 at 9:05 PM,  wrote:
>> > Hi Jim,
>> >
>> > thanks for your reply.
>> >
>> > ad 1)
>> > "all.Rout" was created in the correct directory. It exists properly
> with
>> > correct file properties on Windows, e.g. creation date and time and
> file
>> > size information.
>> >
>> > ad 2)
>> > I can not access the file with Notepad.exe directly after it was
> created
>> > by R. The error message is (translated):
>> >
>> > "Cannot access file "all.Rout". The file is opened by another
> process."
>> >
>> > ad 3)
>> > If I close R completely the file access is released. Then I can read
> the
>> > file using Notepad.exe. The contents is:
>> >
>> > Error in log("a") : non-numeric argument to mathematical function
>> >
>> > I tried
>> >
>> > close(zz)
>> >
>> > but the error persists.
>> >
>> > To me it looks like R is still accessing the file and not releasing
> the
>> > connection for other programs. close(zz) should have solved the
> problem
>> > but unfortantely it doesn't.
>> >
>> > What else could I try?
>> >
>> > Kind regards
>> >
>> > Georg
>> >
>> >
>> >
>> >
>> > Von: Jim Lemon 
>> > An: g.maub...@weinwolf.de,
>> > Kopie: r-help mailing list 
>> > Datum: 10.05.2016 12:50
>> > Betreff: Re: [R] sink(): Cannot open file
>> >
>> >
>> >
>> > Hi Georg,
>> > I don't suppose that you have:
>> >
>> > 1) checked that the file "all.Rout" exists somewhere?
>> >
>> > 2) if so, looked at the file with Notepad, perhaps?
>> >
>> > 3) let us in on the secret by pasting the contents of "all.Rout" into
>> > your message if it is not too big?
>> >
>> > At a guess, trying:
>> >
>> > close(zz)
>> >
>> > might get you there.
>> >
>> > Jim
>> >
>> > On Tue, May 10, 2016 at 5:25 PM,  wrote:
>> >> Hi All,
>> >>
>> >> I would like to route the output to a file using sink(). When using
> the
>> >> example from the

Re: [R] web scraping tables generated in multiple server pages

2016-05-11 Thread boB Rudis

I upgraded ffox to the 46-series and intermittently received the same
error. But by adding a `Sys.sleep(1)` to the final `if`:

  if ((i %% 10) == 0) {
ref <- remDr$findElements("xpath", ".//a[.='...']")
ref[[length(ref)]]$clickElement()
Sys.sleep(1)
  }

I was able to reproduce my original, successful outcome. I think it
has something to do with the page not being fully loaded when the the
driver tries to get the page content. Go multithreading! My choice of
1s was arbitrary. Longer == better chance of it working more often.

This 

would probably also be better (waiting for a full page load signal),
but I try to not use [R]Selenium at all if it can be helped.

-Bob



On Wed, May 11, 2016 at 2:00 PM, boB Rudis  wrote:
> Hey David,
>
> I'm on a Mac as well but have never had to tweak anything to get
> [R]Selenium to work (but this is one reason I try to avoid solutions
> involving RSelenium as they are pretty fragile IMO).
>
> The site itself has "Página 1 de 69" at the top which is where i got
> the "69" from and I just re-ran the code in a 100% clean env (on a
> completely different Mac) and it worked fine.
>
> I did neglect to put my session info up before (apologies):
>
> Session info
> 
>  setting  value
>  version  R version 3.3.0 RC (2016-05-01 r70572)
>  system   x86_64, darwin13.4.0
>  ui   RStudio (0.99.1172)
>  language (EN)
>  collate  en_US.UTF-8
>  tz   America/New_York
>  date 2016-05-11
>
> Packages 
> 
>  package* version  date   source
>  assertthat   0.1  2013-12-06 CRAN (R 3.3.0)
>  bitops * 1.0-62013-08-17 CRAN (R 3.3.0)
>  caTools  1.17.1   2014-09-10 CRAN (R 3.3.0)
>  DBI  0.4  2016-05-02 CRAN (R 3.3.0)
>  devtools   * 1.11.1   2016-04-21 CRAN (R 3.3.0)
>  digest   0.6.92016-01-08 CRAN (R 3.3.0)
>  dplyr  * 0.4.32015-09-01 CRAN (R 3.3.0)
>  httr 1.1.02016-01-28 CRAN (R 3.3.0)
>  magrittr 1.5  2014-11-22 CRAN (R 3.3.0)
>  memoise  1.0.02016-01-29 CRAN (R 3.3.0)
>  pbapply* 1.2-12016-04-19 CRAN (R 3.3.0)
>  R6   2.1.22016-01-26 CRAN (R 3.3.0)
>  Rcpp 0.12.4   2016-03-26 CRAN (R 3.3.0)
>  RCurl  * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>  RJSONIO* 1.3-02014-07-28 CRAN (R 3.3.0)
>  RSelenium  * 1.3.52014-10-26 CRAN (R 3.3.0)
>  rvest  * 0.3.12015-11-11 CRAN (R 3.3.0)
>  selectr  0.2-32014-12-24 CRAN (R 3.3.0)
>  stringi  1.0-12015-10-22 CRAN (R 3.3.0)
>  stringr  1.0.02015-04-30 CRAN (R 3.3.0)
>  withr1.0.12016-02-04 CRAN (R 3.3.0)
>  XML* 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>  xml2   * 0.1.22015-09-01 CRAN (R 3.3.0)
>
> (and, wow, does that tiny snippet of code end up using alot of pkgs)
>
> I had actually started with smaller snippets to test. The code got
> uglier due to the way the site paginates (it loads 10-entries worth of
> data on to a single page but requires a server call for the next 10).
>
> I also keep firefox scarily out-of-date (back in the 33's rev) b/c I
> only use it with RSelenium (not a big fan of the browser). Let me
> update to the 46-series and see if I can replicate.
>
> -Bob
>
> On Wed, May 11, 2016 at 1:48 PM, David Winsemius  
> wrote:
>>
>>> On May 10, 2016, at 1:11 PM, boB Rudis  wrote:
>>>
>>> Unfortunately, it's a wretched, vile, SharePoint-based site. That
>>> means it doesn't use traditional encoding methods to do the pagination
>>> and one of the only ways to do this effectively is going to be to use
>>> RSelenium:
>>>
>>>library(RSelenium)
>>>library(rvest)
>>>library(dplyr)
>>>library(pbapply)
>>>
>>>URL <- 
>>> "http://outorgaonerosa.prefeitura.sp.gov.br/relatorios/RelSituacaoGeralProcessos.aspx;
>>>
>>>checkForServer()
>>>startServer()
>>>remDr <- remoteDriver$new()
>>>remDr$open()
>>
>> Thanks Bob/hrbrmstr;
>>
>> At this point I got an error:
>>
>>>startServer()
>>>remDr <- remoteDriver$new()
>>>remDr$open()
>> [1] "Connecting to remote server"
>> Undefined error in RCurl call.Error in queryRD(paste0(serverURL, 
>> "/session"), "POST", qdata = toJSON(serverOpts)) :
>>
>> Running R 3.0.0 on a Mac (El Cap) in the R.app GUI.
>> $ java -version
>> java version "1.8.0_65"
>> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
>> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
>>
>> I asked myself: What additional information is needed to debug this? But 
>> then I thought I had a responsibility to search for earlier reports of this 
>>

Re: [R] web scraping tables generated in multiple server pages

2016-05-11 Thread boB Rudis

Hey David,

I'm on a Mac as well but have never had to tweak anything to get
[R]Selenium to work (but this is one reason I try to avoid solutions
involving RSelenium as they are pretty fragile IMO).

The site itself has "Página 1 de 69" at the top which is where i got
the "69" from and I just re-ran the code in a 100% clean env (on a
completely different Mac) and it worked fine.

I did neglect to put my session info up before (apologies):

Session info

 setting  value
 version  R version 3.3.0 RC (2016-05-01 r70572)
 system   x86_64, darwin13.4.0
 ui   RStudio (0.99.1172)
 language (EN)
 collate  en_US.UTF-8
 tz   America/New_York
 date 2016-05-11

Packages 

 package* version  date   source
 assertthat   0.1  2013-12-06 CRAN (R 3.3.0)
 bitops * 1.0-62013-08-17 CRAN (R 3.3.0)
 caTools  1.17.1   2014-09-10 CRAN (R 3.3.0)
 DBI  0.4  2016-05-02 CRAN (R 3.3.0)
 devtools   * 1.11.1   2016-04-21 CRAN (R 3.3.0)
 digest   0.6.92016-01-08 CRAN (R 3.3.0)
 dplyr  * 0.4.32015-09-01 CRAN (R 3.3.0)
 httr 1.1.02016-01-28 CRAN (R 3.3.0)
 magrittr 1.5  2014-11-22 CRAN (R 3.3.0)
 memoise  1.0.02016-01-29 CRAN (R 3.3.0)
 pbapply* 1.2-12016-04-19 CRAN (R 3.3.0)
 R6   2.1.22016-01-26 CRAN (R 3.3.0)
 Rcpp 0.12.4   2016-03-26 CRAN (R 3.3.0)
 RCurl  * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 RJSONIO* 1.3-02014-07-28 CRAN (R 3.3.0)
 RSelenium  * 1.3.52014-10-26 CRAN (R 3.3.0)
 rvest  * 0.3.12015-11-11 CRAN (R 3.3.0)
 selectr  0.2-32014-12-24 CRAN (R 3.3.0)
 stringi  1.0-12015-10-22 CRAN (R 3.3.0)
 stringr  1.0.02015-04-30 CRAN (R 3.3.0)
 withr1.0.12016-02-04 CRAN (R 3.3.0)
 XML* 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 xml2   * 0.1.22015-09-01 CRAN (R 3.3.0)

(and, wow, does that tiny snippet of code end up using alot of pkgs)

I had actually started with smaller snippets to test. The code got
uglier due to the way the site paginates (it loads 10-entries worth of
data on to a single page but requires a server call for the next 10).

I also keep firefox scarily out-of-date (back in the 33's rev) b/c I
only use it with RSelenium (not a big fan of the browser). Let me
update to the 46-series and see if I can replicate.

-Bob

On Wed, May 11, 2016 at 1:48 PM, David Winsemius  wrote:
>
>> On May 10, 2016, at 1:11 PM, boB Rudis  wrote:
>>
>> Unfortunately, it's a wretched, vile, SharePoint-based site. That
>> means it doesn't use traditional encoding methods to do the pagination
>> and one of the only ways to do this effectively is going to be to use
>> RSelenium:
>>
>>library(RSelenium)
>>library(rvest)
>>library(dplyr)
>>library(pbapply)
>>
>>URL <- 
>> "http://outorgaonerosa.prefeitura.sp.gov.br/relatorios/RelSituacaoGeralProcessos.aspx;
>>
>>checkForServer()
>>startServer()
>>remDr <- remoteDriver$new()
>>remDr$open()
>
> Thanks Bob/hrbrmstr;
>
> At this point I got an error:
>
>>startServer()
>>remDr <- remoteDriver$new()
>>remDr$open()
> [1] "Connecting to remote server"
> Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session"), 
> "POST", qdata = toJSON(serverOpts)) :
>
> Running R 3.0.0 on a Mac (El Cap) in the R.app GUI.
> $ java -version
> java version "1.8.0_65"
> Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
>
> I asked myself: What additional information is needed to debug this? But then 
> I thought I had a responsibility to search for earlier reports of this error 
> on a Mac, and there were many. After reading this thread: 
> https://github.com/ropensci/RSelenium/issues/54  I decided to try creating an 
> "alias", mac-speak for a symlink, and put that symlink in my working 
> directory (with no further chmod security efforts). I restarted R and re-ran 
> the code which opened a Firefox browser window and then proceeded to page 
> through many pages. Eventually, however it errors out with this message:
>
>>pblapply(1:69, function(i) {
> +
> +  if (i %in% seq(1, 69, 10)) {
> +pg <- read_html(remDr$getPageSource()[[1]])
> +ret <- html_table(html_nodes(pg, "table")[[3]], header=TRUE)
> +
> +  } else {
> +ref <- remDr$findElements("xpath",
> + sprintf(".//a[contains(@href, 'javascript:__doPostBack') and .='%s']",
> + i))
> +ref[[1]]$clickElement()
> +pg <- read_html(remDr$getPageSource()[[1]])
> +ret <- html_table(html_nodes(pg, "table")[[3]], header=TRUE)
> +
> +  }
> +  if ((i %% 10) == 0) {

Re: [R] web scraping tables generated in multiple server pages

2016-05-11 Thread David Winsemius


> On May 10, 2016, at 1:11 PM, boB Rudis  wrote:
> 
> Unfortunately, it's a wretched, vile, SharePoint-based site. That
> means it doesn't use traditional encoding methods to do the pagination
> and one of the only ways to do this effectively is going to be to use
> RSelenium:
> 
>library(RSelenium)
>library(rvest)
>library(dplyr)
>library(pbapply)
> 
>URL <- 
> "http://outorgaonerosa.prefeitura.sp.gov.br/relatorios/RelSituacaoGeralProcessos.aspx;
> 
>checkForServer()
>startServer()
>remDr <- remoteDriver$new()
>remDr$open()

Thanks Bob/hrbrmstr;

At this point I got an error:

>startServer()
>remDr <- remoteDriver$new()
>remDr$open()
[1] "Connecting to remote server"
Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session"), 
"POST", qdata = toJSON(serverOpts)) : 

Running R 3.0.0 on a Mac (El Cap) in the R.app GUI. 
$ java -version
java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

I asked myself: What additional information is needed to debug this? But then I 
thought I had a responsibility to search for earlier reports of this error on a 
Mac, and there were many. After reading this thread: 
https://github.com/ropensci/RSelenium/issues/54  I decided to try creating an 
"alias", mac-speak for a symlink, and put that symlink in my working directory 
(with no further chmod security efforts). I restarted R and re-ran the code 
which opened a Firefox browser window and then proceeded to page through many 
pages. Eventually, however it errors out with this message:

>pblapply(1:69, function(i) {
+ 
+  if (i %in% seq(1, 69, 10)) {
+pg <- read_html(remDr$getPageSource()[[1]])
+ret <- html_table(html_nodes(pg, "table")[[3]], header=TRUE)
+ 
+  } else {
+ref <- remDr$findElements("xpath",
+ sprintf(".//a[contains(@href, 'javascript:__doPostBack') and .='%s']",
+ i))
+ref[[1]]$clickElement()
+pg <- read_html(remDr$getPageSource()[[1]])
+ret <- html_table(html_nodes(pg, "table")[[3]], header=TRUE)
+ 
+  }
+  if ((i %% 10) == 0) {
+ref <- remDr$findElements("xpath", ".//a[.='...']")
+ref[[length(ref)]]$clickElement()
+  }
+ 
+  ret
+ 
+}) -> tabs
   |+++   | 22% ~54s  Error 
in html_nodes(pg, "table")[[3]] : subscript out of bounds
> 
>final_dat <- bind_rows(tabs)
Error in bind_rows(tabs) : object 'tabs' not found


There doesn't seem to be any trace of objects from all the downloading efforts 
that I could find. When I changed both instances of '69' to '30' it no longer 
errors out. Is there supposed to be an initial step of finding out how many 
pages are actually there befor setting the two iteration limits? I'm wondering 
if that code could be modified to return some intermediate values that would be 
amenable to further assembly efforts in the event of errors?

Sincerely;
David.


>remDr$navigate(URL)
> 
>pblapply(1:69, function(i) {
> 
>  if (i %in% seq(1, 69, 10)) {
> 
># the first item on the page is not a link but we can just grab the 
> page
> 
>pg <- read_html(remDr$getPageSource()[[1]])
>ret <- html_table(html_nodes(pg, "table")[[3]], header=TRUE)
> 
>  } else {
> 
># we can get the rest of them by the link text directly
> 
>ref <- remDr$findElements("xpath",
> sprintf(".//a[contains(@href, 'javascript:__doPostBack') and .='%s']",
> i))
>ref[[1]]$clickElement()
>pg <- read_html(remDr$getPageSource()[[1]])
>ret <- html_table(html_nodes(pg, "table")[[3]], header=TRUE)
> 
>  }
> 
>  # we have to move to the next actual page of data after every 10 links
> 
>  if ((i %% 10) == 0) {
>ref <- remDr$findElements("xpath", ".//a[.='...']")
>ref[[length(ref)]]$clickElement()
>  }
> 
>  ret
> 
>}) -> tabs
> 
>final_dat <- bind_rows(tabs)
>final_dat <- final_dat[, c(1, 2, 5, 7, 8, 13, 14)] # the cols you want
>final_dat <- final_dat[complete.cases(final_dat),] # take care of NAs
> 
>remDr$quit()
> 
> 
> Prbly good ref code to have around, but you can grab the data & code
> here: https://gist.github.com/hrbrmstr/ec35ebb32c3cf0aba95f7bad28df1e98
> 
> (anything to help a fellow parent out :-)
> 
> -Bob
> 
> On Tue, May 10, 2016 at 2:45 PM, Michael Friendly  wrote:
>> This is my first attempt to try R web scraping tools, for a project my
>> daughter is working on.  It concerns a data base of projects in Sao
>> Paulo, Brazil, listed at
>> http://outorgaonerosa.prefeitura.sp.gov.br/relatorios/RelSituacaoGeralProcessos.aspx,
>> but spread out over 69 pages accessed through a javascript menu at the
>> bottom of the page.
>> 
>> Each web page contains 3 HTML tables, of which only the last contains
>> the relevant data.  In this, only a subset of

Re: [R] physical constraint with gam

2016-05-11 Thread Dominik Schneider

Hi Simon, Thanks for this explanation.
To make sure I understand, another way of explaining the y axis in my
original example is that it is the contribution to snowdepth relative to
the other variables (the example only had fsca, but my actual case has a
couple others). i.e. a negative s(fsca) of -0.5 simply means snowdepth 0.5
units below the intercept+s(x_i), where s(x_i) could also be negative in
the case where total snowdepth is less than the intercept value.

The use of by=fsca is really useful for interpreting the marginal impact of
the different variables. With my actual data, the term s(fsca):fsca is
never negative, which is much more intuitive. Is it appropriate to compare
magnitudes of e.g. s(x2):x2 / mean(x2) and s(x2):x2 / mean(x2)  where
mean(x_i) are the mean of the actual data?

Lastly, how would these two differ: s(x1,by=x2); or s(x1,by=x1)*s(x2,by=x2)
since interactions are surely present and i'm not sure if a linear
combination is enough.

Thanks!
Dominik


On Wed, May 11, 2016 at 3:11 AM, Simon Wood  wrote:

> The spline having a positive value is not the same as a glm coefficient
> having a positive value. When you plot a smooth, say s(x), that is
> equivalent to plotting the line 'beta * x' in a GLM. It is not equivalent
> to plotting 'beta'. The smooths in a gam are (usually) subject to
> `sum-to-zero' identifiability constraints to avoid confounding via the
> intercept, so they are bound to be negative over some part of the covariate
> range. For example, if I have a model y ~ s(x) + s(z), I can't estimate the
> mean level for s(x) and the mean level for s(z) as they are completely
> confounded, and confounded with the model intercept term.
>
> I suppose that if you want to interpret the smooths as glm parameters
> varying with the covariate they relate to then you can do, by setting the
> model up as a varying coefficient model, using the `by' argument to 's'...
>
> gam(snowdepth~s(fsca,by=fsca),data=dat)
>
>
> this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' . s(fsca,by=fsca)
> is not confounded with the intercept, so no constraint is needed or
> applied, and you can now interpret the smooth like a local GLM coefficient.
>
> best,
> Simon
>
>
>
>
> On 11/05/16 01:30, Dominik Schneider wrote:
>
>> Hi,
>> Just getting into using GAM using the mgcv package. I've generated some
>> models and extracted the splines for each of the variables and started
>> visualizing them. I'm noticing that one of my variables is physically
>> unrealistic.
>>
>> In the example below, my interpretation of the following plot is that the
>> y-axis is basically the equivalent of a "parameter" value of a GLM; in GAM
>> this value can change as the functional relationship changes between x and
>> y. In my case, I am predicting snowdepth based on the fractional snow
>> covered area. In no case will snowdepth realistically decrease for a unit
>> increase in fsca so my question is: *Is there a way to constrain the
>> spline
>> to positive values? *
>>
>> Thanks
>> Dominik
>>
>> library(mgcv)
>> library(dplyr)
>> library(ggplot2)
>> extract_splines=function(mdl){
>>sterms=predict(mdl,type='terms')
>>datplot=cbind(sterms,mdl$model) %>% tbl_df
>>datplot$intercept=attr(sterms,'constant')
>>datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
>>return(datplot)
>> }
>> dat=data_frame(snowdepth=runif(100,min =
>> 0.001,max=6.7),fsca=runif(100,0.01,.99))
>> mdl=gam(snowdepth~s(fsca),data=dat)
>> termdF=extract_splines(mdl)
>> ggplot(termdF)+
>>geom_line(aes(x=fsca,y=`s(fsca)`))
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
> +44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-11 Thread G . Maubach

Duncan,

thanks for the hint.

I have done it correctly in R fashion

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")
sink()
unlink("C:/Temp/all.Rout")

But the error persits.

Kind regards

Georg




Von:Duncan Murdoch 
An: John Sorkin , drjimle...@gmail.com, 
g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  10.05.2016 19:03
Betreff:Re: [R] Antwort: Re: Re: sink(): Cannot open file



On 10/05/2016 11:15 AM, John Sorkin wrote:
> George,
> I do not know what operating system you are working with, but when I use 
sink() under windows, I need to specify a valid path which I don't see in 
your code. I might, for example specify:
>
> sink("c:\myfile.txt")

Note that the backslash should be doubled (so it isn't interpreted as an 
escape for the "m" that follows it), or replaced with a forward slash.

Duncan Murdoch

>   R code goes here
> sink()
>
> with the expectation that I would create a file myfile.txt that would 
contain the output of my R program.
> 
> John
>
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and 
Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing)
> >>>  05/10/16 11:10 AM >>>
> Hi Jim,
>
> I tried:
>
> sink("all.Rout")
> try(log("a"))
> sink()
>
> The program executes without warning or error. The file "all.Rout" is
> begin created. Nothing will be written to it. The file is accessable
> rights after the execution of the program by notepad.exe.
>
> The program
>
> zz <- file("all.Rout", open = "wt")
> sink(zz, type = "message")
> try(log("a"))
> sink()
> close(zz)
> unlink(zz)
>
> creates the file, does not write anything to it and is not accessable
> after program execution in R with notepad.exe.
>
> Any ideas what happens behind the szenes?
>
> Kind regards
>
> Georg
>
>
>
>
> Von: Jim Lemon 
> An: g.maub...@weinwolf.de,
> Kopie: r-help mailing list 
> Datum: 10.05.2016 13:16
> Betreff: Re: Re: [R] sink(): Cannot open file
>
>
>
> Have you tried:
>
> sink("all.Rout")
> try(log("a"))
> sink()
>
> Jim
>
> On Tue, May 10, 2016 at 9:05 PM,  wrote:
> > Hi Jim,
> >
> > thanks for your reply.
> >
> > ad 1)
> > "all.Rout" was created in the correct directory. It exists properly 
with
> > correct file properties on Windows, e.g. creation date and time and 
file
> > size information.
> >
> > ad 2)
> > I can not access the file with Notepad.exe directly after it was 
created
> > by R. The error message is (translated):
> >
> > "Cannot access file "all.Rout". The file is opened by another 
process."
> >
> > ad 3)
> > If I close R completely the file access is released. Then I can read 
the
> > file using Notepad.exe. The contents is:
> >
> > Error in log("a") : non-numeric argument to mathematical function
> >
> > I tried
> >
> > close(zz)
> >
> > but the error persists.
> >
> > To me it looks like R is still accessing the file and not releasing 
the
> > connection for other programs. close(zz) should have solved the 
problem
> > but unfortantely it doesn't.
> >
> > What else could I try?
> >
> > Kind regards
> >
> > Georg
> >
> >
> >
> >
> > Von: Jim Lemon 
> > An: g.maub...@weinwolf.de,
> > Kopie: r-help mailing list 
> > Datum: 10.05.2016 12:50
> > Betreff: Re: [R] sink(): Cannot open file
> >
> >
> >
> > Hi Georg,
> > I don't suppose that you have:
> >
> > 1) checked that the file "all.Rout" exists somewhere?
> >
> > 2) if so, looked at the file with Notepad, perhaps?
> >
> > 3) let us in on the secret by pasting the contents of "all.Rout" into
> > your message if it is not too big?
> >
> > At a guess, trying:
> >
> > close(zz)
> >
> > might get you there.
> >
> > Jim
> >
> > On Tue, May 10, 2016 at 5:25 PM,  wrote:
> >> Hi All,
> >>
> >> I would like to route the output to a file using sink(). When using 
the
> >> example from the ?sink documentation:
> >>
> >> sink("sink-examp.txt")
> >> i <- 1:10
> >> outer(i, i, "*")
> >> sink()
> >> unlink("sink-examp.txt")
> >>
> >> ## capture all the output to a file.
> >> zz <- file("all.Rout", open = "wt")
> >> sink(zz)
> >> sink(zz, type = "message")
> >> try(log("a"))
> >> ## back to the console
> >> sink(type = "message")
> >> sink()
> >> file.show("all.Rout")
> >>
> >> I can not open the file in Windows Explorer. The error message is:
> >>
> >> "Cannot open file. File is in use be another proces."
> >>
> >> How can I close the file in a manner that I can open it right after 
it
> > was
> >> created?
> >>
> >> Kind

[R] Antwort: Re: Re: Antwort: Re: Re: sink(): Cannot open file

2016-05-11 Thread G . Maubach

Hi Sarah,

yes, I followed your suggestion.

If I do exactly what is in the example of the documentation:

sink("C:/Temp/sink-examp.txt")
i <- 1:10
outer(i, i, "*")
sink()
unlink("C:/Temp/sink-examp.txt")

it does not write anything, i. e. no file is created in "C:/Temp/". The 
script is executed without an error or warning message.

If I run

## capture all the output to a file.
zz <- file("C:/Temp/all.Rout", open = "wt")
sink(zz)
sink(zz, type = "message")
try(log("a"))
## back to the console
sink(type = "message")  # I think ,this was your suggestion
sink()
unlink("C:/Temp/all.Rout")

the script is executed without error or warning message, the file is 
created in "C:/Temp/" but if I try to open it right away after the script 
is done the message

DE: "Auf das Dokument "C:\Temp\all.Rout" kann nicht zugegriffen werden, da 
es von einer anderen Anwendung verwendet wird."
EN: "Cannot access the document "C:\Temp\all.Rout" cause it is used by 
another application."

What do I do wrong?

Kind regards

Georg




Von:Sarah Goslee 
An: g.maub...@weinwolf.de, 
Datum:  10.05.2016 18:46
Betreff:Re: Re: [R] Antwort: Re: Re: sink(): Cannot open file



On Tue, May 10, 2016 at 12:34 PM,   wrote:
> sink(type = "message")


But did you do that ^^ as I suggested?


If you start a message sink with
sink(zz, type="message")
as you did, you need to explicitly close that stream. Just using
sink()
doesn't do it.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Quantiles on multiply imputed survey data - mitools

2016-05-11 Thread Anne Bichteler

Thanks so SO much.

Brennan

www.toxstrategies.com




From:  Anthony Damico 
Date:  Wednesday, May 11, 2016 at 11:17 AM
To:  Anne Bichteler 
Cc:  "r-help@r-project.org" 
Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools


hi, you want   se=T

M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')), 
quantiles = c(.5),se=T))
MIcombine(M_quantile)



Multiple imputation results:
  with(des_mult, svyquantile(make.formula(get("var_name")), quantiles = 
c(0.5),

se = T))
  MIcombine.default(M_quantile)
   results   se
LBXTCD 12.7978 6.917285








On Wed, May 11, 2016 at 12:09 PM, Anne Bichteler 
 wrote:

Thanks for looking. No, for the quantiles it fails to instantiate the 
collection of designs correctly, whether hard-coding the variable name or using 
make.formula. 'with' passes make.formula correctly when calculating the mean, 
e.g. this works:

MIcombine( with(des, svymean(make.formula(get('var_name')

# Here's a reproducible example.

DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
  SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
  WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 
65085, 21765, 197775, 49931),
  LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9, 15.3, 
13.8, 84.5))

DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
  SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
  WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 
65085, 21765, 197775, 49931),
  LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4, 10.5, 
84.5))

var_name <- "LBXTCD"

# Individually svyquantile (and svymean) work:
des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=Df1_red, nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm = FALSE)

des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=Df2_red, nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm = FALSE)

Imputed_list <- c()
Imputed_list[[1]] <- DF1
Imputed_list[[2]] <- DF2

# svymean works (so the svydesign object is fine?) but svyquantile doesn't:
des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=imputationList(Imputed_list), nest=TRUE)
M_mean <- with(des_mult, svymean(make.formula(get('var_name'
summary(M_mean)
M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')), 
quantiles = c(.5)))
summary(M_quantile)


Thanks again,

Brennan

www.toxstrategies.com 


From:  Anthony Damico 
Date:  Tuesday, May 10, 2016 at 10:37 PM
To:  Anne Bichteler 
Cc:  "r-help@r-project.org" 
Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools


is the `with` not passing make.formula( get( 'var_name' ) ) through to 
svyquantile for some reason?  does this work?

MIcombine( with(des, svyquantile(~LBXTCD, .5)))



if that's not it, could you make a minimal reproducible example that includes 
the data download?  code to download and import nhanes here

https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey





On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler
 wrote:

Hello, and thank you for considering this question:

The svystat object created with multiply imputed NHANES data files is failing 
on calling survey::svyquantile. I'm wondering if I'm diagnosing the issue 
correctly, whether the behavior is expected, and whether y'all might have any 
ideas for workarounds.

I'm following T. Lumley's general method outlined here:
http://faculty.washington.edu/tlumley/old-survey/svymi.html 
,
 but with data files I've imputed myself on the 2001/2002 biennial. Each file 
has 1081 observations and no missing values.

### Create the survey design object with list of imputed data files 
ImputedList0102.
des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=imputationList(ImputedList0102), nest=TRUE)


### Blood analyte of interest
var_name <- "LBXTCD" # analyte in blood serum

### All is well calculating the mean:
M <- with(des, svymean(make.formula(get('var_name'
summary(M)
Result <- MIcombine(M)
Result$coefficients
# LBXTCD
# 17.41635


### but svystat object fails to calculate a 50th percentile:
### it fails when hard-coding the name rather than using make.formula;
### it fails regardless of number of files or choices in handling ties or 
interval type.
### There are 16 ties in each data file.
M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles = c(.5)))
summary(M1)

# Length Class  Mode
#[1,] 1  -none- numeric
#[2,] 1  -none- numeric
#[3,] 1

Re: [R] Quantiles on multiply imputed survey data - mitools

2016-05-11 Thread Anthony Damico

hi, you want   se=T

M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')),
quantiles = c(.5),se=T))
MIcombine(M_quantile)



Multiple imputation results:
  with(des_mult, svyquantile(make.formula(get("var_name")), quantiles =
c(0.5),
se = T))
  MIcombine.default(M_quantile)
   results   se
LBXTCD 12.7978 6.917285






On Wed, May 11, 2016 at 12:09 PM, Anne Bichteler <
abichte...@toxstrategies.com> wrote:

> Thanks for looking. No, for the quantiles it fails to instantiate the
> collection of designs correctly, whether hard-coding the variable name or
> using make.formula. 'with' passes make.formula correctly when calculating
> the mean, e.g. this works:
>
> MIcombine( with(des, svymean(make.formula(get('var_name')
>
> # Here's a reproducible example.
>
> DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
>   SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
>   WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959,
> 65085, 21765, 197775, 49931),
>   LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9,
> 15.3, 13.8, 84.5))
>
> DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2),
>   SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
>   WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959,
> 65085, 21765, 197775, 49931),
>   LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4,
> 10.5, 84.5))
>
> var_name <- "LBXTCD"
>
> # Individually svyquantile (and svymean) work:
> des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=Df1_red, nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm =
> FALSE)
>
> des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=Df2_red, nest=TRUE)
> svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm =
> FALSE)
>
> Imputed_list <- c()
> Imputed_list[[1]] <- DF1
> Imputed_list[[2]] <- DF2
>
> # svymean works (so the svydesign object is fine?) but svyquantile doesn't:
> des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(Imputed_list), nest=TRUE)
> M_mean <- with(des_mult, svymean(make.formula(get('var_name'
> summary(M_mean)
> M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')),
> quantiles = c(.5)))
> summary(M_quantile)
>
>
> Thanks again,
>
> Brennan
>
> www.toxstrategies.com
>
>
> From:  Anthony Damico 
> Date:  Tuesday, May 10, 2016 at 10:37 PM
> To:  Anne Bichteler 
> Cc:  "r-help@r-project.org" 
> Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools
>
>
> is the `with` not passing make.formula( get( 'var_name' ) ) through to
> svyquantile for some reason?  does this work?
>
> MIcombine( with(des, svyquantile(~LBXTCD, .5)))
>
>
>
> if that's not it, could you make a minimal reproducible example that
> includes the data download?  code to download and import nhanes here
>
>
> https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey
>
>
>
>
>
> On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler
>  wrote:
>
> Hello, and thank you for considering this question:
>
> The svystat object created with multiply imputed NHANES data files is
> failing on calling survey::svyquantile. I'm wondering if I'm diagnosing the
> issue correctly, whether the behavior is expected, and whether y'all might
> have any ideas for workarounds.
>
> I'm following T. Lumley's general method outlined here:
> http://faculty.washington.edu/tlumley/old-survey/svymi.html <
> http://faculty.washington.edu/tlumley/old-survey/svymi.html>, but with
> data files I've imputed myself on the 2001/2002 biennial. Each file has
> 1081 observations and no missing values.
>
> ### Create the survey design object with list of imputed data files
> ImputedList0102.
> des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR,
> data=imputationList(ImputedList0102), nest=TRUE)
>
>
> ### Blood analyte of interest
> var_name <- "LBXTCD" # analyte in blood serum
>
> ### All is well calculating the mean:
> M <- with(des, svymean(make.formula(get('var_name'
> summary(M)
> Result <- MIcombine(M)
> Result$coefficients
> # LBXTCD
> # 17.41635
>
>
> ### but svystat object fails to calculate a 50th percentile:
> ### it fails when hard-coding the name rather than using make.formula;
> ### it fails regardless of number of files or choices in handling ties or
> interval type.
> ### There are 16 ties in each data file.
> M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles =
> c(.5)))
> summary(M1)
>
> # Length Class  Mode
> #[1,] 1  -none- numeric
> #[2,] 1  -none- numeric
> #[3,] 1  -none- numeric
>
>
> ### The quantile is successfully calculated on one file at a time,
> however, and is different for each file.
> ### (had thought

Re: [R] Quantiles on multiply imputed survey data - mitools

2016-05-11 Thread Anne Bichteler

Thanks for looking. No, for the quantiles it fails to instantiate the 
collection of designs correctly, whether hard-coding the variable name or using 
make.formula. 'with' passes make.formula correctly when calculating the mean, 
e.g. this works:

MIcombine( with(des, svymean(make.formula(get('var_name')

# Here's a reproducible example.

DF1 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2), 
  SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
  WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 
65085, 21765, 197775, 49931),
  LBXTCD = c(20.4, 29.7, 8.8, 18.0, 22.2, 10.4, 43.9, 15.3, 
13.8, 84.5))

DF2 <- data.frame(SDMVPSU = c(1,1,1,1,1,2,2,2,2,2), 
  SDMVSTRA = c(22, 20, 24, 18, 20, 22, 20, 24, 18, 20),
  WTSPO2YR = c(252605, 82199, 24946, 147236, 3679, 294959, 
65085, 21765, 197775, 49931),
  LBXTCD = c(21.9, 29.7, 9.2, 5.9, 32.8, 8.9, 43.9, 7.4, 10.5, 
84.5))

var_name <- "LBXTCD"

# Individually svyquantile (and svymean) work:
des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=Df1_red, nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single1, c(.5), na.rm = FALSE)

des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=Df2_red, nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single2, c(.5), na.rm = FALSE)

Imputed_list <- c()
Imputed_list[[1]] <- DF1
Imputed_list[[2]] <- DF2

# svymean works (so the svydesign object is fine?) but svyquantile doesn't:
des_mult <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=imputationList(Imputed_list), nest=TRUE)
M_mean <- with(des_mult, svymean(make.formula(get('var_name'
summary(M_mean)
M_quantile <- with(des_mult, svyquantile(make.formula(get('var_name')), 
quantiles = c(.5)))
summary(M_quantile)


Thanks again,

Brennan

www.toxstrategies.com


From:  Anthony Damico 
Date:  Tuesday, May 10, 2016 at 10:37 PM
To:  Anne Bichteler 
Cc:  "r-help@r-project.org" 
Subject:  Re: [R] Quantiles on multiply imputed survey data - mitools


is the `with` not passing make.formula( get( 'var_name' ) ) through to 
svyquantile for some reason?  does this work?

MIcombine( with(des, svyquantile(~LBXTCD, .5)))



if that's not it, could you make a minimal reproducible example that includes 
the data download?  code to download and import nhanes here

https://github.com/ajdamico/asdfree/tree/master/National%20Health%20and%20Nutrition%20Examination%20Survey





On Tue, May 10, 2016 at 4:33 PM, Anne Bichteler 
 wrote:

Hello, and thank you for considering this question:

The svystat object created with multiply imputed NHANES data files is failing 
on calling survey::svyquantile. I'm wondering if I'm diagnosing the issue 
correctly, whether the behavior is expected, and whether y'all might have any 
ideas for workarounds.

I'm following T. Lumley's general method outlined here: 
http://faculty.washington.edu/tlumley/old-survey/svymi.html 
, but with data 
files I've imputed myself on the 2001/2002 biennial. Each file has 1081 
observations and no missing values.

### Create the survey design object with list of imputed data files 
ImputedList0102.
des <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=imputationList(ImputedList0102), nest=TRUE)


### Blood analyte of interest
var_name <- "LBXTCD" # analyte in blood serum

### All is well calculating the mean:
M <- with(des, svymean(make.formula(get('var_name'
summary(M)
Result <- MIcombine(M)
Result$coefficients
# LBXTCD
# 17.41635


### but svystat object fails to calculate a 50th percentile:
### it fails when hard-coding the name rather than using make.formula;
### it fails regardless of number of files or choices in handling ties or 
interval type.
### There are 16 ties in each data file.
M1 <- with(des, svyquantile(make.formula(get('var_name')), quantiles = c(.5)))
summary(M1)

# Length Class  Mode
#[1,] 1  -none- numeric
#[2,] 1  -none- numeric
#[3,] 1  -none- numeric


### The quantile is successfully calculated on one file at a time, however, and 
is different for each file.
### (had thought perhaps there was a lack-of-variance issue). The quantile 
calculated on each file
### is the same regardless of interval.type.
des_single1 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=ImputedList0102[[1]], nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single1, c(.5))
# 0.5
# LBXTCD 13.5554


des_single2 <- svydesign(id=~SDMVPSU, strat=~SDMVSTRA, weight=~WTSPO2YR, 
data=ImputedList0102[[2]], nest=TRUE)
svyquantile(make.formula(get('var_name')), des_single2, c(.5))
# 0.5
# LBXTCD 14.06154

# The number of observations exceeding the 50th percentile differs for each 
file, which I can't claim to understand.

# I

Re: [R] how to manipulate ... in the argument list

2016-05-11 Thread Vito M. R. Muggeo


Hi Witold,
use do.call()

list.args<-list(...)

#modify 'list.args' (add/delete/modify)

do.call(image, list.args)

best,
vito


Il 11/05/2016 10.45, Witold E Wolski ha scritto:

Hi,

I am looking for a documentation describing how to manipulate the
"..." . Searching R-intro.html gives to many not relevant hits for
"..."

What I want to do is something like this :


image.2 <- function(x, col , ...){
  # function is manipulating colors (adding a few)
  # since it changes colors it needs to update breaks if defined.

   breaks <- list(...)$breaks

  if( !is.null( list(...)$breaks ) ){
 #manipulate breaks

image(x, col, breaks = breaks ,...)

   }else{
  image(x,col ,...)
   }
}

but in order to get it working I will need to remove breaks from ...
since otherwise I am getting multiple defined argument for breaks.

So how to manipulate the "..." argument? Or should I use a different pattern

best





--
==
Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 23895240
fax: 091 485726
http://dssm.unipa.it/vmuggeo
Associate Editor, Statistical Modelling

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] factor variables in logistic regression

2016-05-11 Thread Kevin E. Thorpe


On 05/11/2016 08:00 AM, ch.elahe via R-help wrote:

Hi all,

I have a plot for TSTMean vs. SNRMean and both of these variables are factors. 
How can I use Logistic Regression for factor variables?
Currently I use model=lm(TSTMean~SNRMean,data=df) but when I check 
summary(model) I get this error: r error in quartile.default (resid) factors 
are not allowed

thanks for any help,
Elahe



First of all, lm() is for linear regression, not logistic regression. 
For logistic regression you need to use glm() and make sure you set the 
correct family (see ?glm). I don't recall if glm() accepts a factor 
outcome but if not, you would need to re-code it to 0/1.


Kevin

--
Kevin E. Thorpe
Head of Biostatistics,  Applied Health Research Centre (AHRC)
Li Ka Shing Knowledge Institute of St. Michael's Hospital
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: kevin.tho...@utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] web scraping tables generated in multiple server pages / Best of R-help

2016-05-11 Thread Michael Friendly

On 5/10/2016 4:11 PM, boB Rudis wrote:
> Unfortunately, it's a wretched, vile, SharePoint-based site. That
> means it doesn't use traditional encoding methods to do the pagination
> and one of the only ways to do this effectively is going to be to use
> RSelenium:
>
R-help is not stack exchange, where people get "reputation" points for 
good answers,
and R-help often sees a lot of unhelpful and sometimes unkind answers.
So, when someone is exceptionally helpful, it is worthwhile 
acknowledging it
in public, as I do now, with my "Best of R-help" award to Bob Rudis.

Not only did he point me to RSelenium, but he wrote a complete solution
to the problem, and gave me the generated data on a github link.
It was slick, and I learned a lot from it.

best,
-Michael

-- 
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R-es] [Grupo de Usuarios de R de Madrid]: Siguiente reunión mañana jueves 12-mayo...

2016-05-11 Thread Carlos Ortega

Hola,

Por si os interesa asistir y podéis asistir:

http://madrid.r-es.org/35-jueves-12-de-mayo-2016/

Gracias,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

[R] factor variables in logistic regression

2016-05-11 Thread ch.elahe via R-help

Hi all,

I have a plot for TSTMean vs. SNRMean and both of these variables are factors. 
How can I use Logistic Regression for factor variables?
Currently I use model=lm(TSTMean~SNRMean,data=df) but when I check 
summary(model) I get this error: r error in quartile.default (resid) factors 
are not allowed
 
thanks for any help,
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ensure parameter is a string when passed within an lapply & called function runs a 'substitute' on it

2016-05-11 Thread Andrew Clancy

Thanks David - my earlier response to Bert contains the resolution.
partialPlot was commented out deliberately as it was the target function
who's behaviour I was replicating in testFunc. The original behaviour, ie.
printing 'X1' was correct, and the do.call fix yields this same response
when testFunc is called within lapply. As I'm replicating partialPlot, no
changes can be made to testFunc (eg. your removal of 'substitute')
otherwise I'd need to patch the randomForest::partialPlot package &
function. The correct patch would be to change the eval to use the parent
environment, the subtitue should remain.

See the resolution here (jcheng beat r-help to it this time!)
https://groups.google.com/forum/?utm_medium=email_source=footer#!topic/shiny-discuss/cIZJzQmw8tQ


On 11 May 2016 at 08:48, David Winsemius  wrote:

>
> > On May 9, 2016, at 2:39 PM, Andrew Clancy  wrote:
> >
> > Hi,
> >
> > I’m trying to solve what looks like the same issue as stack overflow
> article, but within an lapply:
> >
> http://stackoverflow.com/questions/18939254/cant-use-a-variable-as-an-argument-but-can-use-its-value
>
>
> It would be helpful if you could articulate the issue.
>
> >
> > I’ve replicated the issue with partialPlot below in ‘testFunc’. The
> lines up to the final print can’t change (including the substitute). In the
> first call it prints out ‘X1’ correctly, in the second it prints out ‘var’.
> I’ve tried eval, quote etc as the article suggests. Any ideas?
> >
> > numObs  <- 10
> > numVars <- 6
> > dataSet<- data.frame(replicate(numVars,rnorm(numObs)))
> > # partialPlot(x = model, pred.data = dataSet, x.var = 'X1', plot = F)
>
> I'm assuming that the comment character is actually something that was
> inserted in hte process of stripping hte HTML from this posting.
>
> It throws an error when removed:
>
> Error in partialPlot(x = model, pred.data = dataSet, x.var = "X1", plot =
> F) :
>   object 'model' not found
>
> >
>
> > testFunc <- function(x, pred.data, x.var, plot=F) {
> >   x.var <- substitute(x.var)
>
> Try changing to eval(x.bar)
>
> >   # print(paste('is.character(x.var)', is.character(x.var), 
> > 'is.name(x.var)',
> is.name(x.var)))
>
> >   xname <- if (is.character(x.var)) x.var else {
> > if (is.name(x.var)) deparse(x.var) else {
> >   eval(x.var)
> > }
> >   }
> >   print(xname)
> >   # print(head(pred.data[,xname]))
> > }
> >
> > vars <- names(dataSet)[[1]]
> > testFunc(x = model, pred.data = dataSet, x.var = local(vars), plot = F)
>
> Returns:
> [1] "is.character(x.var) TRUE is.name(x.var) FALSE"
> [1] "X1"
> [1]  0.8704543 -0.4421564 -0.6725336 -1.3096399 -1.0531335 -0.4979650
>
>
> >
> > lapply(vars, function(var) {
> >   # print(paste('var', var))
> >   testFunc(x = model, pred.data = dataSet, x.var = var, plot = F)
> > })
>
> Retruns:
> [1] "var X1"
> [1] "is.character(x.var) TRUE is.name(x.var) FALSE"
> [1] "X1"
> [1]  0.8704543 -0.4421564 -0.6725336 -1.3096399 -1.0531335 -0.4979650
> [[1]]
> [1]  0.8704543 -0.4421564 -0.6725336 -1.3096399 -1.0531335 -0.4979650
>
>
> >
> >   [[alternative HTML version deleted]]
>
> Please learn to post in plain text for this mailing list.
>
> --
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to manipulate ... in the argument list

2016-05-11 Thread Duncan Murdoch


On 11/05/2016 4:45 AM, Witold E Wolski wrote:

Hi,

I am looking for a documentation describing how to manipulate the
"..." . Searching R-intro.html gives to many not relevant hits for
"..."

What I want to do is something like this :


image.2 <- function(x, col , ...){
  # function is manipulating colors (adding a few)
  # since it changes colors it needs to update breaks if defined.

   breaks <- list(...)$breaks

  if( !is.null( list(...)$breaks ) ){
 #manipulate breaks

image(x, col, breaks = breaks ,...)

   }else{
  image(x,col ,...)
   }
}

but in order to get it working I will need to remove breaks from ...
since otherwise I am getting multiple defined argument for breaks.


If breaks is an argument that image.2 uses, you should just list it 
explicitly, and it won't become part of ... .


However, if you really want to do what you describe, you can do it using 
do.call.  Replace


image(x, col, breaks = breaks, ...)

with

dots <- list(...)
dots$breaks <- NULL
do.call(image, c(list(x, col, breaks = breaks), dots))



So how to manipulate the "..." argument? Or should I use a different pattern


I'd recommend a different pattern, i.e. include breaks as an argument, 
and possibly use is.missing(breaks) to determine when it has not been used.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating data frame of predicted and actual values in R for plotting

2016-05-11 Thread Muhammad Bilal

I have achieved this use case by writing the following commands:

all_predictions <- data.frame(pid = testPFI$project_id, actual_delay = 
testPFI$project_delay,lm_pred, tree_pred, best_tree_pred, rf_pred)

str(all_predictions)

all_pred <- sqldf("SELECT pid, actual_delay, ROUND(lm_pred,2) lm_pred,
   ROUND(tree_pred,2) tree_pred,
   ROUND(best_tree_pred,2) train_pred,
   ROUND(rf_pred,2) rf_pred
 FROM all_predictions
  ORDER BY actual_delay")
all_pred

#Plotting all the predictions on the graph
ggplot(all_pred, aes(x=pid)) + geom_line(aes(y=actual_delay), colour="blue") +
  geom_line(aes(y=lm_pred), colour="red", size=1)  +
  geom_line(aes(y=tree_pred), colour="green", size=1)  +
  geom_line(aes(y=train_pred), colour="yellow", size=1)  +
  geom_line(aes(y=rf_pred), colour="black", size=1)

So I am done.

Many Thanks and

Kind Regards
--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: Muhammad Bilal
Sent: 11 May 2016 01:06:32
To: r-help@r-project.org
Subject: Re: [R] Creating data frame of predicted and actual values in R for
plotting

Pls don't mind the typo in predict() functions for some of the models.

Sent from my iPhone

> On 11 May 2016, at 12:47 am, Muhammad Bilal  
> wrote:
>
> Hi All,
>
>
> I have the following dataset:
>
>
>> str(pfi_v3)
> 'data.frame': 714 obs. of  8 variables:
> $ project_id : int  1 2 3 4 5 6 7 8 9 10 ...
> $ project_lat: num  51.4 51.5 52.2 51.5 53.5 ...
> $ project_lon: num  -0.642 -1.85 0.08 0.126 -1.392 ...
> $ sector : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 
> 6 6 6 6 6 6 6 ...
> $ project_duration   : int  1826 3652 121 520 1087 730 730 730 790 522 ...
> $ project_delay  : int  -323 0 -60 0 0 0 0 0 0 -91 ...
> $ capital_value  : num  6.7 5.8 21.8 47.3 47 24.2 40.7 71.9 10.7 70 
> ...
> $ contract_type  : Factor w/ 2 levels "Lumpsum","Turnkey": 2 2 2 2 2 
> 2 2 2 2 2 ...
>
>
> I'm using following commands to create training and test sets:
>
> split <- sample.split(pfi_v3, SplitRatio = 0.8)
> trainPFI <- subset(pfi_v3, split == TRUE)
> testPFI <- subset(pfi_v3, split == FALSE)
>
>
> I am using several predictive models to estimate delay in projects.
>
>
> The commands are given as below:
>
>
> 1. Simple linear regression
>
> lm_m <- lm(project_delay ~ project_lon +
>
> project_lat +
>
> project_duration +
>
> sector +
>
> contract_type +
>
> capital_value,
>
> data = trainPFI)
>
> lm_pred <- predict(lm_m2, newdata = testPFI)
>
>
> 2. Regression tree
>
> tree_m <- rpart(project_delay ~ project_lon +
>  project_lat +
>  project_duration +
>  sector +
>  contract_type +
>  capital_value,
>data = trainPFI)
>
> tree_pred <- predict(tree_m2, newdata = testPFI)
>
> 3. Cp optimsed regression tree
>
> train_m <- train(project_delay ~ project_lon +
>   project_lat +
>   project_duration +
>   sector +
>   contract_type +
>   capital_value,
> data = trainPFI,
> method="rpart",
> trControl=tr.control, tuneGrid = cp.grid)
>
>
> train_pred <- predict(tr_m, newdata = testPFI)
>
>
> 4. Random Forest
>
> rf_m <- randomForest(project_delay ~ project_lon +
>   project_lat +
>   project_duration +
>   sector +
>   contract_type +
>   capital_value,
> data = trainPFI,
> importance=TRUE,
> ntree = 2000)
>
> rf_pred <- predict(rf_m, newdata = testPFI)
>
> 5. Conditional Forest
> cf_m <- cforest(project_delay ~ project_lon +
>   project_lat +
>   project_duration +
>   sector +
>

Re: [R] how to manipulate ... in the argument list

2016-05-11 Thread Jim Lemon

Hi Witold,
You could try Ben Bolker's "clean.args" function in the plotrix package.

Jim


On Wed, May 11, 2016 at 6:45 PM, Witold E Wolski  wrote:
> Hi,
>
> I am looking for a documentation describing how to manipulate the
> "..." . Searching R-intro.html gives to many not relevant hits for
> "..."
>
> What I want to do is something like this :
>
>
> image.2 <- function(x, col , ...){
>  # function is manipulating colors (adding a few)
>  # since it changes colors it needs to update breaks if defined.
>
>   breaks <- list(...)$breaks
>
>  if( !is.null( list(...)$breaks ) ){
> #manipulate breaks
>
>image(x, col, breaks = breaks ,...)
>
>   }else{
>  image(x,col ,...)
>   }
> }
>
> but in order to get it working I will need to remove breaks from ...
> since otherwise I am getting multiple defined argument for breaks.
>
> So how to manipulate the "..." argument? Or should I use a different pattern
>
> best
>
>
>
> --
> Witold Eryk Wolski
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] physical constraint with gam

2016-05-11 Thread Simon Wood

The spline having a positive value is not the same as a glm coefficient 
having a positive value. When you plot a smooth, say s(x), that is 
equivalent to plotting the line 'beta * x' in a GLM. It is not 
equivalent to plotting 'beta'. The smooths in a gam are (usually) 
subject to `sum-to-zero' identifiability constraints to avoid 
confounding via the intercept, so they are bound to be negative over 
some part of the covariate range. For example, if I have a model y ~ 
s(x) + s(z), I can't estimate the mean level for s(x) and the mean level 
for s(z) as they are completely confounded, and confounded with the 
model intercept term.


I suppose that if you want to interpret the smooths as glm parameters 
varying with the covariate they relate to then you can do, by setting 
the model up as a varying coefficient model, using the `by' argument to 
's'...


gam(snowdepth~s(fsca,by=fsca),data=dat)


this model is `snowdepth_i = f(fsca_i) * fsca_i + e_i' . s(fsca,by=fsca) 
is not confounded with the intercept, so no constraint is needed or 
applied, and you can now interpret the smooth like a local GLM coefficient.


best,
Simon




On 11/05/16 01:30, Dominik Schneider wrote:

Hi,
Just getting into using GAM using the mgcv package. I've generated some
models and extracted the splines for each of the variables and started
visualizing them. I'm noticing that one of my variables is physically
unrealistic.

In the example below, my interpretation of the following plot is that the
y-axis is basically the equivalent of a "parameter" value of a GLM; in GAM
this value can change as the functional relationship changes between x and
y. In my case, I am predicting snowdepth based on the fractional snow
covered area. In no case will snowdepth realistically decrease for a unit
increase in fsca so my question is: *Is there a way to constrain the spline
to positive values? *

Thanks
Dominik

library(mgcv)
library(dplyr)
library(ggplot2)
extract_splines=function(mdl){
   sterms=predict(mdl,type='terms')
   datplot=cbind(sterms,mdl$model) %>% tbl_df
   datplot$intercept=attr(sterms,'constant')
   datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
   return(datplot)
}
dat=data_frame(snowdepth=runif(100,min =
0.001,max=6.7),fsca=runif(100,0.01,.99))
mdl=gam(snowdepth~s(fsca),data=dat)
termdF=extract_splines(mdl)
ggplot(termdF)+
   geom_line(aes(x=fsca,y=`s(fsca)`))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Simon Wood, School of Mathematics, University of Bristol BS8 1TW UK
+44 (0)117 33 18273 http://www.maths.bris.ac.uk/~sw15190

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R-es] Diccionario de Palabras en Inglés para identificar el tipo de palabra (verbo, adjetivo, etc)

2016-05-11 Thread Carlos Ortega

Hola,

En la anterior reunión del "Grupo de Usuarios de R de Madrid", Miguel Ángel
Gómez hizo una presentación sobre análisis de sentimiento en noticias
financieras y dio varias referencias de lo que buscas, además de que su
código lo compartió.

El detalle lo puedes ver aquí:

http://madrid.r-es.org/34-jueves-14-de-abril-2016/

El video aunque de calidad de grabación pobre, se puede seguir...

Gracias,
Carlos.

El 11 de mayo de 2016, 10:50, Toni Massó Jou  escribió:

> Hola:
>
> Estoy analizando texto, y tengo una necesidad que la librería que estoy
> utilizando no resuelve (utilizo "tm").
>
> Alguien sabe algún paquete BBDD, o lo que sea que me pueda decir si una
> palabra Inglesa es un nombre, verbo, etc?
>
> En la documentación de tm mencionan:  http://wordnet.princeton.edu/  y
> parece que es algo parecido a lo que busco. Alguien lo ha utilizado? Sabéis
> de alguna alternativa?
>
> Muchas gracias!
>
> Att. Toni Massó
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>



-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R-es] Tablas - resultados - stargazer

2016-05-11 Thread rubenfcasal


Tiene buenas pintas...

Yo tengo un paquete casero para generar y mostrar resultados de estudios 
de simulación (en formato html) y esto parece que me puede ir bien.


Gracias por la información.

Un saludo, Rubén.


El 10/05/2016 a las 22:13, Javier Marcuzzi escribió:

En LinkedIn aparece esto que es bueno compartir. Puede ser útil a varios de 
nosotros.

http://jakeruss.com/cheatsheets/stargazer.html#report-t-statistics-or-p-values-instead-of-standard-errors

Javier Rubén Marcuzzi


[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es



___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

[R] how to manipulate ... in the argument list

2016-05-11 Thread Witold E Wolski

Hi,

I am looking for a documentation describing how to manipulate the
"..." . Searching R-intro.html gives to many not relevant hits for
"..."

What I want to do is something like this :


image.2 <- function(x, col , ...){
 # function is manipulating colors (adding a few)
 # since it changes colors it needs to update breaks if defined.

  breaks <- list(...)$breaks

 if( !is.null( list(...)$breaks ) ){
#manipulate breaks

   image(x, col, breaks = breaks ,...)

  }else{
 image(x,col ,...)
  }
}

but in order to get it working I will need to remove breaks from ...
since otherwise I am getting multiple defined argument for breaks.

So how to manipulate the "..." argument? Or should I use a different pattern

best



-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] physical constraint with gam

2016-05-11 Thread David Winsemius


> On May 10, 2016, at 5:30 PM, Dominik Schneider 
>  wrote:
> 
> Hi,
> Just getting into using GAM using the mgcv package. I've generated some
> models and extracted the splines for each of the variables and started
> visualizing them. I'm noticing that one of my variables is physically
> unrealistic.
> 
> In the example below, my interpretation of the following plot is that the
> y-axis is basically the equivalent of a "parameter" value of a GLM; in GAM
> this value can change as the functional relationship changes between x and
> y. In my case, I am predicting snowdepth based on the fractional snow
> covered area. In no case will snowdepth realistically decrease for a unit
> increase in fsca so my question is: *Is there a way to constrain the spline
> to positive values? *
> 

I would think that the mass or volume of snow might not realistically decrease 
with increase in area but I see no reason why increasing the area might not be 
associated with an decrease in mean depth. Depth would be "orthogonal" to area.



> Thanks
> Dominik
> 
> library(mgcv)
> library(dplyr)
> library(ggplot2)
> extract_splines=function(mdl){
>  sterms=predict(mdl,type='terms')
>  datplot=cbind(sterms,mdl$model) %>% tbl_df
>  datplot$intercept=attr(sterms,'constant')
>  datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
>  return(datplot)
> }
> dat=data_frame(snowdepth=runif(100,min =
> 0.001,max=6.7),fsca=runif(100,0.01,.99))
> mdl=gam(snowdepth~s(fsca),data=dat)
> termdF=extract_splines(mdl)
> ggplot(termdF)+
>  geom_line(aes(x=fsca,y=`s(fsca)`))
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ensure parameter is a string when passed within an lapply & called function runs a 'substitute' on it

2016-05-11 Thread David Winsemius


> On May 9, 2016, at 2:39 PM, Andrew Clancy  wrote:
> 
> Hi, 
> 
> I’m trying to solve what looks like the same issue as stack overflow article, 
> but within an lapply:
> http://stackoverflow.com/questions/18939254/cant-use-a-variable-as-an-argument-but-can-use-its-value


It would be helpful if you could articulate the issue.

> 
> I’ve replicated the issue with partialPlot below in ‘testFunc’. The lines up 
> to the final print can’t change (including the substitute). In the first call 
> it prints out ‘X1’ correctly, in the second it prints out ‘var’. I’ve tried 
> eval, quote etc as the article suggests. Any ideas?
> 
> numObs  <- 10
> numVars <- 6
> dataSet<- data.frame(replicate(numVars,rnorm(numObs)))
> # partialPlot(x = model, pred.data = dataSet, x.var = 'X1', plot = F) 

I'm assuming that the comment character is actually something that was inserted 
in hte process of stripping hte HTML from this posting.

It throws an error when removed:

Error in partialPlot(x = model, pred.data = dataSet, x.var = "X1", plot = F) : 
  object 'model' not found

> 

> testFunc <- function(x, pred.data, x.var, plot=F) {
>   x.var <- substitute(x.var)

Try changing to eval(x.bar)

>   # print(paste('is.character(x.var)', is.character(x.var), 'is.name(x.var)', 
> is.name(x.var)))

>   xname <- if (is.character(x.var)) x.var else {
> if (is.name(x.var)) deparse(x.var) else {
>   eval(x.var)
> }
>   }
>   print(xname)
>   # print(head(pred.data[,xname]))
> }
> 
> vars <- names(dataSet)[[1]]
> testFunc(x = model, pred.data = dataSet, x.var = local(vars), plot = F)

Returns:
[1] "is.character(x.var) TRUE is.name(x.var) FALSE"
[1] "X1"
[1]  0.8704543 -0.4421564 -0.6725336 -1.3096399 -1.0531335 -0.4979650


> 
> lapply(vars, function(var) {
>   # print(paste('var', var))
>   testFunc(x = model, pred.data = dataSet, x.var = var, plot = F)
> })

Retruns:
[1] "var X1"
[1] "is.character(x.var) TRUE is.name(x.var) FALSE"
[1] "X1"
[1]  0.8704543 -0.4421564 -0.6725336 -1.3096399 -1.0531335 -0.4979650
[[1]]
[1]  0.8704543 -0.4421564 -0.6725336 -1.3096399 -1.0531335 -0.4979650


> 
>   [[alternative HTML version deleted]]

Please learn to post in plain text for this mailing list.

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] physical constraint with gam

2016-05-11 Thread Dominik Schneider

Hi,
Just getting into using GAM using the mgcv package. I've generated some
models and extracted the splines for each of the variables and started
visualizing them. I'm noticing that one of my variables is physically
unrealistic.

In the example below, my interpretation of the following plot is that the
y-axis is basically the equivalent of a "parameter" value of a GLM; in GAM
this value can change as the functional relationship changes between x and
y. In my case, I am predicting snowdepth based on the fractional snow
covered area. In no case will snowdepth realistically decrease for a unit
increase in fsca so my question is: *Is there a way to constrain the spline
to positive values? *

Thanks
Dominik

library(mgcv)
library(dplyr)
library(ggplot2)
extract_splines=function(mdl){
  sterms=predict(mdl,type='terms')
  datplot=cbind(sterms,mdl$model) %>% tbl_df
  datplot$intercept=attr(sterms,'constant')
  datplot$yhat=rowSums(sterms)+attr(sterms,'constant')
  return(datplot)
}
dat=data_frame(snowdepth=runif(100,min =
0.001,max=6.7),fsca=runif(100,0.01,.99))
mdl=gam(snowdepth~s(fsca),data=dat)
termdF=extract_splines(mdl)
ggplot(termdF)+
  geom_line(aes(x=fsca,y=`s(fsca)`))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

37 matches

Mail list logo