Re: [R] Nonlinear logistic regression fitting

2020-07-29 Thread Sebastien Bihorel via R-help
Thanks Duncan,

(Sorry for the repeated email)

People working in my field are frequently (and rightly) accused of butchering 
statistical terminology. So I guess I'm guilty as charged 

I will look into the suggested path. One question though in your expression of 
loglik, p is "a + b*x/(c+x)". Correct?

Thanks


From: Duncan Murdoch 
Sent: Wednesday, July 29, 2020 16:04
To: Sebastien Bihorel ; J C Nash 
; r-help@r-project.org 
Subject: Re: [R] Nonlinear logistic regression fitting

Just a quick note about jargon:  you are using the word "likelihood" in
a way that I (and maybe some others) find confusing. (In fact, I think
you used it two different ways, but maybe I'm just confused.) I would
say that likelihood is the probability of observing the entire data set,
considered as a function of the parameters.  You appear to be using it
(at first) as the probability that a particular observation is equal to
1, and then as the argument to a logit function to give that probability.

What you probably want to do is find the parameters that maximize the
likelihood (in my sense).  The usual practice is to maximize the log of
the likelihood; it tends to be easier to work with.  In your notation
below, the log likelihood would be

   loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) )

When you have a linear logistic regression model, this simplifies a bit,
and there are fast algorithms that are usually stable to optimize it.
With a nonlinear model, you lose some of that, and I'd suggest directly
optimizing it.

Duncan Murdoch

On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote:
> Thank your, Pr. Nash, for your perspective on the issue.
>
> Here is an example of binary data/response (resp) that were simulated and 
> re-estimated assuming a non linear effect of the predictor (x) on the 
> likelihood of response. For re-estimation, I have used gnlm::bnlr for the 
> logistic regression. The accuracy of the parameter estimates is so-so, 
> probably due to the low number of data points (8*nx). For illustration, I 
> have also include a glm call to an incorrect linear model of x.
>
> #install.packages(gnlm)
> #require(gnlm)
> set.seed(12345)
>
> nx <- 10
> x <- c(
>rep(0, 3*nx),
>rep(c(10, 30, 100, 500, 1000), each = nx)
> )
> rnd <- runif(length(x))
> a <- log(0.2/(1-0.2))
> b <- log(0.7/(1-0.7)) - a
> c <- 30
> likelihood <- a + b*x/(c+x)
> p <- exp(likelihood) / (1 + exp(likelihood))
> resp <- ifelse(rnd <= p, 1, 0)
>
> df <- data.frame(
>x = x,
>resp = resp,
>nresp = 1- resp
> )
>
> head(df)
>
> # glm can only assume linear effect of x, which is the wrong model
> glm_mod <- glm(
>resp~x,
>data = df,
>family = 'binomial'
> )
> glm_mod
>
> # Using gnlm package, estimate a model model with just intercept, and a model 
> with predictor effect
> int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) )
> emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit',  mu = ~ p_a + 
> p_b*x/(p_c+x),  pmu = c(a, b, c) )
>
> int_mod
> emax_mod
>
> 
> From: J C Nash 
> Sent: Tuesday, July 28, 2020 14:16
> To: Sebastien Bihorel ; 
> r-help@r-project.org 
> Subject: Re: [R] Nonlinear logistic regression fitting
>
> There is a large literature on nonlinear logistic models and similar
> curves. Some of it is referenced in my 2014 book Nonlinear Parameter
> Optimization Using R Tools, which mentions nlxb(), now part of the
> nlsr package. If useful, I could put the Bibtex refs for that somewhere.
>
> nls() is now getting long in the tooth. It has a lot of flexibility and
> great functionality, but it did very poorly on the Hobbs problem that
> rather forced me to develop the codes that are 3/5ths of optim() and
> also led to nlsr etc. The Hobbs problem dated from 1974, and with only
> 12 data points still defeats a majority of nonlinear fit programs.
> nls() poops out because it has no LM stabilization and a rather weak
> forward difference derivative approximation. nlsr tries to generate
> analytic derivatives, which often help when things are very badly scaled.
>
> Another posting suggests an example problem i.e., some data and a
> model, though you also need the loss function (e.g., Max likelihood,
> weights, etc.). Do post some data and functions so we can provide more
> focussed advice.
>
> JN
>
> On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote:
>> Hi
>>
>> I need to fit a logistic regression model using a saturable Michaelis-Menten 
>> function of my predictor x. The likelihood could be expressed as:
>>
>> L = intercept + emax * x / (EC50+x)
>>
>> Whi

Re: [R] Nonlinear logistic regression fitting

2020-07-29 Thread Sebastien Bihorel via R-help
Thanks Duncan,

People working in my field are frequently (and rightly) accused of butchering 
statistical terminology. So I guess I'm guilty as charged 

I will look into the suggested path. One question though in your expression:

loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) )
a + b*x/(c+x)


From: Duncan Murdoch 
Sent: Wednesday, July 29, 2020 16:04
To: Sebastien Bihorel ; J C Nash 
; r-help@r-project.org 
Subject: Re: [R] Nonlinear logistic regression fitting

Just a quick note about jargon:  you are using the word "likelihood" in
a way that I (and maybe some others) find confusing. (In fact, I think
you used it two different ways, but maybe I'm just confused.) I would
say that likelihood is the probability of observing the entire data set,
considered as a function of the parameters.  You appear to be using it
(at first) as the probability that a particular observation is equal to
1, and then as the argument to a logit function to give that probability.

What you probably want to do is find the parameters that maximize the
likelihood (in my sense).  The usual practice is to maximize the log of
the likelihood; it tends to be easier to work with.  In your notation
below, the log likelihood would be

   loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) )

When you have a linear logistic regression model, this simplifies a bit,
and there are fast algorithms that are usually stable to optimize it.
With a nonlinear model, you lose some of that, and I'd suggest directly
optimizing it.

Duncan Murdoch

On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote:
> Thank your, Pr. Nash, for your perspective on the issue.
>
> Here is an example of binary data/response (resp) that were simulated and 
> re-estimated assuming a non linear effect of the predictor (x) on the 
> likelihood of response. For re-estimation, I have used gnlm::bnlr for the 
> logistic regression. The accuracy of the parameter estimates is so-so, 
> probably due to the low number of data points (8*nx). For illustration, I 
> have also include a glm call to an incorrect linear model of x.
>
> #install.packages(gnlm)
> #require(gnlm)
> set.seed(12345)
>
> nx <- 10
> x <- c(
>rep(0, 3*nx),
>rep(c(10, 30, 100, 500, 1000), each = nx)
> )
> rnd <- runif(length(x))
> a <- log(0.2/(1-0.2))
> b <- log(0.7/(1-0.7)) - a
> c <- 30
> likelihood <- a + b*x/(c+x)
> p <- exp(likelihood) / (1 + exp(likelihood))
> resp <- ifelse(rnd <= p, 1, 0)
>
> df <- data.frame(
>x = x,
>resp = resp,
>nresp = 1- resp
> )
>
> head(df)
>
> # glm can only assume linear effect of x, which is the wrong model
> glm_mod <- glm(
>resp~x,
>data = df,
>family = 'binomial'
> )
> glm_mod
>
> # Using gnlm package, estimate a model model with just intercept, and a model 
> with predictor effect
> int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) )
> emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit',  mu = ~ p_a + 
> p_b*x/(p_c+x),  pmu = c(a, b, c) )
>
> int_mod
> emax_mod
>
> 
> From: J C Nash 
> Sent: Tuesday, July 28, 2020 14:16
> To: Sebastien Bihorel ; 
> r-help@r-project.org 
> Subject: Re: [R] Nonlinear logistic regression fitting
>
> There is a large literature on nonlinear logistic models and similar
> curves. Some of it is referenced in my 2014 book Nonlinear Parameter
> Optimization Using R Tools, which mentions nlxb(), now part of the
> nlsr package. If useful, I could put the Bibtex refs for that somewhere.
>
> nls() is now getting long in the tooth. It has a lot of flexibility and
> great functionality, but it did very poorly on the Hobbs problem that
> rather forced me to develop the codes that are 3/5ths of optim() and
> also led to nlsr etc. The Hobbs problem dated from 1974, and with only
> 12 data points still defeats a majority of nonlinear fit programs.
> nls() poops out because it has no LM stabilization and a rather weak
> forward difference derivative approximation. nlsr tries to generate
> analytic derivatives, which often help when things are very badly scaled.
>
> Another posting suggests an example problem i.e., some data and a
> model, though you also need the loss function (e.g., Max likelihood,
> weights, etc.). Do post some data and functions so we can provide more
> focussed advice.
>
> JN
>
> On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote:
>> Hi
>>
>> I need to fit a logistic regression model using a saturable Michaelis-Menten 
>> function of my predictor x. The likelihood could be expressed as:
>>
>> L = intercept + emax * x / (EC50+x)
>>
>> Which I guess could be express

Re: [R] Nonlinear logistic regression fitting

2020-07-29 Thread Sebastien Bihorel via R-help
Thank your, Pr. Nash, for your perspective on the issue.

Here is an example of binary data/response (resp) that were simulated and 
re-estimated assuming a non linear effect of the predictor (x) on the 
likelihood of response. For re-estimation, I have used gnlm::bnlr for the 
logistic regression. The accuracy of the parameter estimates is so-so, probably 
due to the low number of data points (8*nx). For illustration, I have also 
include a glm call to an incorrect linear model of x.

#install.packages(gnlm)
#require(gnlm)
set.seed(12345)

nx <- 10
x <- c(
  rep(0, 3*nx),
  rep(c(10, 30, 100, 500, 1000), each = nx)
)
rnd <- runif(length(x))
a <- log(0.2/(1-0.2))
b <- log(0.7/(1-0.7)) - a
c <- 30
likelihood <- a + b*x/(c+x)
p <- exp(likelihood) / (1 + exp(likelihood))
resp <- ifelse(rnd <= p, 1, 0)

df <- data.frame(
  x = x,
  resp = resp,
  nresp = 1- resp
)

head(df)

# glm can only assume linear effect of x, which is the wrong model
glm_mod <- glm(
  resp~x,
  data = df,
  family = 'binomial'
)
glm_mod

# Using gnlm package, estimate a model model with just intercept, and a model 
with predictor effect
int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) )
emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit',  mu = ~ p_a + 
p_b*x/(p_c+x),  pmu = c(a, b, c) )

int_mod
emax_mod


From: J C Nash 
Sent: Tuesday, July 28, 2020 14:16
To: Sebastien Bihorel ; 
r-help@r-project.org 
Subject: Re: [R] Nonlinear logistic regression fitting

There is a large literature on nonlinear logistic models and similar
curves. Some of it is referenced in my 2014 book Nonlinear Parameter
Optimization Using R Tools, which mentions nlxb(), now part of the
nlsr package. If useful, I could put the Bibtex refs for that somewhere.

nls() is now getting long in the tooth. It has a lot of flexibility and
great functionality, but it did very poorly on the Hobbs problem that
rather forced me to develop the codes that are 3/5ths of optim() and
also led to nlsr etc. The Hobbs problem dated from 1974, and with only
12 data points still defeats a majority of nonlinear fit programs.
nls() poops out because it has no LM stabilization and a rather weak
forward difference derivative approximation. nlsr tries to generate
analytic derivatives, which often help when things are very badly scaled.

Another posting suggests an example problem i.e., some data and a
model, though you also need the loss function (e.g., Max likelihood,
weights, etc.). Do post some data and functions so we can provide more
focussed advice.

JN

On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote:
> Hi
>
> I need to fit a logistic regression model using a saturable Michaelis-Menten 
> function of my predictor x. The likelihood could be expressed as:
>
> L = intercept + emax * x / (EC50+x)
>
> Which I guess could be expressed as the following R model
>
> ~ emax*x/(ec50+x)
>
> As far as I know (please, correct me if I am wrong), fitting such a model is 
> to not doable with glm, since the function is not linear.
>
> A Stackoverflow post recommends the bnlr function from the gnlm 
> (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)...
>  I would be grateful for any opinion on this package or for any alternative 
> recommendation of package/function.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear logistic regression fitting

2020-07-28 Thread Sebastien Bihorel via R-help
Hi Rui,

Thanks for your input.

In my analysis, the MM model is not intended to fit continuous data but must be 
used within a logistic regression model of binary data. So, while useful in 
itself, the suggested example does not exactly apply.

I appreciate your time


From: Rui Barradas 
Sent: Tuesday, July 28, 2020 12:42
To: Sebastien Bihorel ; 
r-help@r-project.org 
Subject: Re: [R] Nonlinear logistic regression fitting

Hello,

glm might not be the right tool for the MM model but nls is meant to fit
non-linear models.
And, after an on-line search, there is also package drc, function drm.

I will use the data and examples in the links below. (The second gave me
right, it uses nls.)


#install.packages("drc")
library(drc)

#--- data

# substrate
S <- c(0,1,2,5,8,12,30,50)

# reaction rate
v <- c(0,11.1,25.4,44.8,54.5,58.2,72.0,60.1)
kinData <- data.frame(S, v)


#--- package drc fit

# use the two parameter MM model (MM.2)
drm_fit <- drm(v ~ S, data = kinData, fct = MM.2())

#--- nls fit
MMcurve <- formula(v ~ Vmax*S/(Km + S))
nls_fit <- nls(MMcurve, kinData, start = list(Vmax = 50, Km = 2))

coef(drm_fit)
coef(nls_fit)

#--- plot

SconcRange <- seq(0, 50, 0.1)
nls_Line <- predict(nls_fit, list(S = SconcRange))

plot(drm_fit, log = '', pch = 17, col = "red", main = "Fitted MM curve")
lines(SconcRange, nls_Line, col = "blue", lty = "dotted")


[1]
https://davetang.org/muse/2013/05/17/fitting-a-michaelis-mentens-curve-using/
[2]
http://rforbiochemists.blogspot.com/2015/05/plotting-and-fitting-enzymology-data.html


Hope this helps,

Rui Barradas

�s 15:13 de 28/07/2020, Sebastien Bihorel via R-help escreveu:
> Hi
>
> I need to fit a logistic regression model using a saturable Michaelis-Menten 
> function of my predictor x. The likelihood could be expressed as:
>
> L = intercept + emax * x / (EC50+x)
>
> Which I guess could be expressed as the following R model
>
> ~ emax*x/(ec50+x)
>
> As far as I know (please, correct me if I am wrong), fitting such a model is 
> to not doable with glm, since the function is not linear.
>
> A Stackoverflow post recommends the bnlr function from the gnlm 
> (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)...
>  I would be grateful for any opinion on this package or for any alternative 
> recommendation of package/function.
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Este e-mail foi verificado em termos de v�rus pelo software antiv�rus Avast.
https://www.avast.com/antivirus


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear logistic regression fitting

2020-07-28 Thread Sebastien Bihorel via R-help
I hardly see how your reply addressed my question or any part of it. It looks 
to me that it was simply assumed that I did not perform any search before 
posting.


From: Bert Gunter 
Sent: Tuesday, July 28, 2020 11:30
To: Sebastien Bihorel 
Cc: r-help@r-project.org 
Subject: Re: [R] Nonlinear logistic regression fitting

You said:
"As far as I know (please, correct me if I am wrong), fitting such a model is 
to not doable with glm, since the function is not linear."

My reply responded to that.

AFAIK, opinions on packages are off topic here. Try 
stats.stackexchange.com<http://stats.stackexchange.com> for that.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jul 28, 2020 at 8:19 AM Sebastien Bihorel 
mailto:sebastien.biho...@cognigencorp.com>> 
wrote:
Thank you for your subtle input, Bert... as usual!

This is literally the search I conducted and spent 2 hours on before posting to 
R-help. I was asking for expert opinions, not for search engine FAQ!

Thank anyways


From: Bert Gunter mailto:bgunter.4...@gmail.com>>
Sent: Tuesday, July 28, 2020 11:12
To: Sebastien Bihorel 
mailto:sebastien.biho...@cognigencorp.com>>
Cc: r-help@r-project.org<mailto:r-help@r-project.org> 
mailto:r-help@r-project.org>>
Subject: Re: [R] Nonlinear logistic regression fitting

Search!
... for "nonlinear logistic regression" at rseek.org<http://rseek.org>.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jul 28, 2020 at 7:25 AM Sebastien Bihorel via R-help 
mailto:r-help@r-project.org>> wrote:
Hi

I need to fit a logistic regression model using a saturable Michaelis-Menten 
function of my predictor x. The likelihood could be expressed as:

L = intercept + emax * x / (EC50+x)

Which I guess could be expressed as the following R model

~ emax*x/(ec50+x)

As far as I know (please, correct me if I am wrong), fitting such a model is to 
not doable with glm, since the function is not linear.

A Stackoverflow post recommends the bnlr function from the gnlm 
(https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)...
 I would be grateful for any opinion on this package or for any alternative 
recommendation of package/function.
__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear logistic regression fitting

2020-07-28 Thread Sebastien Bihorel via R-help
Thank you for your subtle input, Bert... as usual!

This is literally the search I conducted and spent 2 hours on before posting to 
R-help. I was asking for expert opinions, not for search engine FAQ!

Thank anyways


From: Bert Gunter 
Sent: Tuesday, July 28, 2020 11:12
To: Sebastien Bihorel 
Cc: r-help@r-project.org 
Subject: Re: [R] Nonlinear logistic regression fitting

Search!
... for "nonlinear logistic regression" at rseek.org<http://rseek.org>.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jul 28, 2020 at 7:25 AM Sebastien Bihorel via R-help 
mailto:r-help@r-project.org>> wrote:
Hi

I need to fit a logistic regression model using a saturable Michaelis-Menten 
function of my predictor x. The likelihood could be expressed as:

L = intercept + emax * x / (EC50+x)

Which I guess could be expressed as the following R model

~ emax*x/(ec50+x)

As far as I know (please, correct me if I am wrong), fitting such a model is to 
not doable with glm, since the function is not linear.

A Stackoverflow post recommends the bnlr function from the gnlm 
(https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)...
 I would be grateful for any opinion on this package or for any alternative 
recommendation of package/function.
__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Nonlinear logistic regression fitting

2020-07-28 Thread Sebastien Bihorel via R-help
Hi 

I need to fit a logistic regression model using a saturable Michaelis-Menten 
function of my predictor x. The likelihood could be expressed as:

L = intercept + emax * x / (EC50+x)

Which I guess could be expressed as the following R model 

~ emax*x/(ec50+x)

As far as I know (please, correct me if I am wrong), fitting such a model is to 
not doable with glm, since the function is not linear. 

A Stackoverflow post recommends the bnlr function from the gnlm 
(https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)...
 I would be grateful for any opinion on this package or for any alternative 
recommendation of package/function.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating file from raw connection

2020-05-29 Thread Sebastien Bihorel via R-help


Thanks Duncan

From: Duncan Murdoch 
Sent: Friday, May 29, 2020 15:36
To: Sebastien Bihorel ; 
r-help@r-project.org 
Subject: Re: [R] Creating file from raw connection 
 
On 29/05/2020 3:00 p.m., Sebastien Bihorel via R-help wrote:
> Hi,
> 
> Let's say I can extract the content of an Excel .xlsx file stored in a 
> database and store it as raw content in an R object. What would be the proper 
> way to "create" a .xlsx file and "transfer" the content of this obj into it? 
> I took the example of an Excel file, but my question would extend to any kind 
> of binary file.
> 
> Thank you in advance for your input

It depends on how the .xlsx was put in to the database and then 
extracted into R, but if it's just a copy of a file from disk, 
writeBin() will write it without changes.

Duncan Murdoch
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating file from raw content

2020-05-29 Thread Sebastien Bihorel via R-help
Hi,

Let's say I can extract the content of an Excel .xlsx file stored in a database 
and store it as raw content in an R object. What would be the proper way to 
"create" a .xlsx file and "transfer" the content of this obj into it? I took 
the example of an Excel file, but my question would extend to any kind of 
binary file.

Thank you in advance for your input

Sebastien
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating file from raw connection

2020-05-29 Thread Sebastien Bihorel via R-help
Hi,

Let's say I can extract the content of an Excel .xlsx file stored in a database 
and store it as raw content in an R object. What would be the proper way to 
"create" a .xlsx file and "transfer" the content of this obj into it? I took 
the example of an Excel file, but my question would extend to any kind of 
binary file.

Thank you in advance for your input

Sebastien


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] POSIX system oddities

2020-03-29 Thread Sebastien Bihorel via R-help
Duh !!!

Thanks.


From: Peter Langfelder 
Sent: Sunday, March 29, 2020 20:12
To: Sebastien Bihorel 
Cc: r-help@r-project.org 
Subject: Re: [R] POSIX system oddities

The time has changed from "standard" (EST) to "Daylight saving" (EDT) which 
shaves off 1 hour.

Peter

On Sun, Mar 29, 2020 at 5:03 PM Sebastien Bihorel via R-help 
mailto:r-help@r-project.org>> wrote:
Hi,

Why is there less number of seconds on 03/10/2019 in the internal POSIX system? 
The difference between the previous or the next day eems to be exactly 1 hour. 
I could not find anything in the manuals on CRAN.

> dates <- as.POSIXct(sprintf('03/%s/2019',9:12), format = '%m/%d/%Y')
> dates
[1] "2019-03-09 EST" "2019-03-10 EST" "2019-03-11 EDT" "2019-03-12 EDT"
> diff(as.numeric(dates[1:2]))
[1] 86400
> diff(as.numeric(dates[2:3]))
[1] 82800
> diff(as.numeric(dates[3:4]))
[1] 86400



[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] POSIX system oddities

2020-03-29 Thread Sebastien Bihorel via R-help
Hi,

Why is there less number of seconds on 03/10/2019 in the internal POSIX system? 
The difference between the previous or the next day eems to be exactly 1 hour. 
I could not find anything in the manuals on CRAN.

> dates <- as.POSIXct(sprintf('03/%s/2019',9:12), format = '%m/%d/%Y')
> dates
[1] "2019-03-09 EST" "2019-03-10 EST" "2019-03-11 EDT" "2019-03-12 EDT"
> diff(as.numeric(dates[1:2]))
[1] 86400
> diff(as.numeric(dates[2:3]))
[1] 82800
> diff(as.numeric(dates[3:4]))
[1] 86400



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Sebastien Bihorel via R-help
Thanks Bill and Jeff

strip.white did not change the outcomes.

However, your inputs led me to compare the raw content of the files (ie, 
outside of an IDE) and found difference in how the apparent -99 were stored. In 
the big file, some -99 are stored as floats rather than integers and thus 
included a decimal point and trailing zeros.

The creation of the smaller files resulted in the removal of the decimal point 
and trailing zeros, explaining why read.table provided the "right " response on 
these smaller files.

So, it looks like this is the problem and that some additional post-processing 
may be warranted.

Thanks for the hints.


From: William Dunlap 
Sent: Thursday, November 14, 2019 11:51
To: Jeff Newmiller 
Cc: Sebastien Bihorel ; 
r-help@r-project.org 
Subject: Re: [R] Can file size affect how na.strings operates in a read.table 
call?

read.table (and friends) also have the strip.white argument:

> s <- "A,B,C\n0,0,0\n1,-99,-99\n2,-99 ,-99\n3, -99, -99\n"
> read.csv(text=s, header=TRUE, na.strings="-99", strip.white=TRUE)
  A  B  C
1 0  0  0
2 1 NA NA
3 2 NA NA
4 3 NA NA
> read.csv(text=s, header=TRUE, na.strings="-99", strip.white=FALSE)
  A   B   C
1 0   0   0
2 1  NA  NA
3 2 -99  NA
4 3 -99 -99

Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>


On Thu, Nov 14, 2019 at 8:35 AM Jeff Newmiller 
mailto:jdnew...@dcn.davis.ca.us>> wrote:
Consider the following sample:

#
s <- "A,B,C
0,0,0
1,-99,-99
2,-99 ,-99
3, -99, -99
"

dta_notok <- read.csv( text = s
  , header=TRUE
  , na.strings = c( "-99", "" )
  )

dta_ok <- read.csv( text = s
   , header=TRUE
   , na.strings = c( "-99", " -99"
   , "-99 ", ""
   )
   )

library(data.table)

fdt_ok <- fread( text = s, na.strings=c( "-99", "" ) )
fdta_ok <- as.data.frame( fdt_ok )
#

Leading and trailing spaces cause problems. The data.table::fread function
has a strip.white argument that defaults to TRUE, but the resulting object
is a data.table which has different semantics than a data.frame.

On Thu, 14 Nov 2019, Sebastien Bihorel wrote:

> The data file is a csv file. Some text variables contain spaces.
>
> "Check for extraneous spaces"
> Are there specific locations that would be more critical than others?
>
>
> ____________
> From: Jeff Newmiller 
> mailto:jdnew...@dcn.davis.ca.us>>
> Sent: Thursday, November 14, 2019 10:52
> To: Sebastien Bihorel 
> mailto:sebastien.biho...@cognigencorp.com>>;
>  Sebastien
> Bihorel via R-help mailto:r-help@r-project.org>>; 
> r-help@r-project.org<mailto:r-help@r-project.org>
> mailto:r-help@r-project.org>>
> Subject: Re: [R] Can file size affect how na.strings operates in a
> read.table call?
> Check for extraneous spaces. You may need more variations of the na.strings.
>
> On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help
> mailto:r-help@r-project.org>> wrote:
> >Hi,
> >
> >I have this generic function to read ASCII data files. It is
> >essentially a wrapper around the read.table function. My function is
> >used in a large variety of situations and has no a priori knowledge
> >about the data file it is asked to read. Nothing is known about file
> >size, variable types, variable names, or data table dimensions.
> >
> >One argument of my function is na.strings which is passed down to
> >read.table.
> >
> >Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by
> >~ 160 columns) using na.strings = c('-99', '.') with the intention of
> >interpreting '.' and '-99'
> >strings as the internal missing data NA. Dots were converted to NA
> >appropriately. However, not all -99 values in the data were interpreted
> >as NA. In some variables, -99 were converted to NA, while in others -99
> >was read as a number. More surprisingly, when the data file was cut in
> >smaller chunks (ie, by dropping either rows or columns) saved in
> >multiple files, the function calls applied on the new data files
> >resulted in the correct conversion of the -99 values into NAs.
> >
> >In all cases, the data frames produced by read.table contained the
> >expected number of records.
> >
> >While, on face value, it appears that file size affects how the
> >na.strings argument operates, I wondering if there is something else at
> >play here.
> >
> >Unfortunately, I cannot share th

Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Sebastien Bihorel via R-help
The data file is a csv file. Some text variables contain spaces.

"Check for extraneous spaces"
Are there specific locations that would be more critical than others?



From: Jeff Newmiller 
Sent: Thursday, November 14, 2019 10:52
To: Sebastien Bihorel ; Sebastien Bihorel 
via R-help ; r-help@r-project.org 
Subject: Re: [R] Can file size affect how na.strings operates in a read.table 
call?

Check for extraneous spaces. You may need more variations of the na.strings.

On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help 
 wrote:
>Hi,
>
>I have this generic function to read ASCII data files. It is
>essentially a wrapper around the read.table function. My function is
>used in a large variety of situations and has no a priori knowledge
>about the data file it is asked to read. Nothing is known about file
>size, variable types, variable names, or data table dimensions.
>
>One argument of my function is na.strings which is passed down to
>read.table.
>
>Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by
>~ 160 columns) using na.strings = c('-99', '.') with the intention of
>interpreting '.' and '-99'
>strings as the internal missing data NA. Dots were converted to NA
>appropriately. However, not all -99 values in the data were interpreted
>as NA. In some variables, -99 were converted to NA, while in others -99
>was read as a number. More surprisingly, when the data file was cut in
>smaller chunks (ie, by dropping either rows or columns) saved in
>multiple files, the function calls applied on the new data files
>resulted in the correct conversion of the -99 values into NAs.
>
>In all cases, the data frames produced by read.table contained the
>expected number of records.
>
>While, on face value, it appears that file size affects how the
>na.strings argument operates, I wondering if there is something else at
>play here.
>
>Unfortunately, I cannot share the data file for confidentiality reason
>but was wondering if you could suggest some checks I could perform to
>get to the bottom on this issue.
>
>Thank you in advance for your help and sorry for the lack of
>reproducible example.
>
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Sebastien Bihorel via R-help
Hi,

I have this generic function to read ASCII data files. It is essentially a 
wrapper around the read.table function. My function is used in a large variety 
of situations and has no a priori knowledge about the data file it is asked to 
read. Nothing is known about file size, variable types, variable names, or data 
table dimensions.

One argument of my function is na.strings which is passed down to read.table.

Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by ~ 160 
columns) using na.strings = c('-99', '.') with the intention of interpreting 
'.' and '-99'
strings as the internal missing data NA. Dots were converted to NA 
appropriately. However, not all -99 values in the data were interpreted as NA. 
In some variables, -99 were converted to NA, while in others -99 was read as a 
number. More surprisingly, when the data file was cut in smaller chunks (ie, by 
dropping either rows or columns) saved in multiple files, the function calls 
applied on the new data files resulted in the correct conversion of the -99 
values into NAs.

In all cases, the data frames produced by read.table contained the expected 
number of records.

While, on face value, it appears that file size affects how the na.strings 
argument operates, I wondering if there is something else at play here. 

Unfortunately, I cannot share the data file for confidentiality reason but was 
wondering if you could suggest some checks I could perform to get to the bottom 
on this issue.

Thank you in advance for your help and sorry for the lack of reproducible 
example.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table and NaN

2019-10-25 Thread Sebastien Bihorel via R-help
My bad, Bert 

My point is that my function/framework has very minimal expectations about the 
source data (mostly, that it is a rectangular shape table of data separated by 
some separator) and does not have any a-priori knowledge about what the first, 
second, etc columns in the data files must contain so while it would be 
possible to pass down some class vector which would be passed down as the 
colClasses argument to read.table, it is not necessarily reasonable in the 
context of the overall framework.

I guess I was surprised that read.table interprets NaN in an input file as the 
internal "Not a number" rather than as a string... there is nothing in the 
?read.table about that.

Anyways, as I said, I need to think more about this in the context of the 
framework where this function operates...

Thanks for the input



From: Bert Gunter 
Sent: Thursday, October 24, 2019 10:39
To: Sebastien Bihorel 
Cc: r-help@r-project.org 
Subject: Re: [R] read.table and NaN

Not so. Read ?read.table carefully. You can use "NA" as a default. Moreover, 
you **specified** that you want NaN read as character, which means that any 
column containing NaN **must** be character. That's part of the specification 
for data frames (all columns must be one data type). So either change your 
specfication or change your data structure.

And, incidentally, my first name is "Bert" .

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, Oct 24, 2019 at 6:43 AM Sebastien Bihorel 
mailto:sebastien.biho...@cognigencorp.com>> 
wrote:
Thanks Gunter

It seems that one has to know the structure of the data and adapt the 
read.table call accordingly. I am working on a framework that is meant to 
process data files with unknown structure, so I have to think a bit more about 
that...

From: Bert Gunter mailto:bgunter.4...@gmail.com>>
Sent: Thursday, October 24, 2019 00:08
To: Sebastien Bihorel 
mailto:sebastien.biho...@cognigencorp.com>>
Cc: r-help@r-project.org<mailto:r-help@r-project.org> 
mailto:r-help@r-project.org>>
Subject: Re: [R] read.table and NaN

Like this?

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
> tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', 
> stringsAsFactors = FALSE,
+   colClasses = c("numeric", "character"))
> close.connection(con)
> tmp
   A   B
1  1 NaN
2 NA   2
> class(tmp[,1])
[1] "numeric"
> class(tmp[,2])
[1] "character"
> tmp[,2]
[1] "NaN" "2"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Oct 23, 2019 at 6:31 PM Sebastien Bihorel via R-help 
mailto:r-help@r-project.org>> wrote:
Hi,

Is there a way to make read.table consider NaN as a string of characters rather 
than the internal NaN? Changing the na.strings argument does not seems to have 
any effect on how R interprets the NaN string (while is does not the the NA 
string)

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', 
stringsAsFactors = FALSE)
close.connection(con)
tmp
class(tmp[,1])
class(tmp[,2])


__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table and NaN

2019-10-24 Thread Sebastien Bihorel via R-help
Thanks Gunter

It seems that one has to know the structure of the data and adapt the 
read.table call accordingly. I am working on a framework that is meant to 
process data files with unknown structure, so I have to think a bit more about 
that...

From: Bert Gunter 
Sent: Thursday, October 24, 2019 00:08
To: Sebastien Bihorel 
Cc: r-help@r-project.org 
Subject: Re: [R] read.table and NaN

Like this?

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
> tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', 
> stringsAsFactors = FALSE,
+   colClasses = c("numeric", "character"))
> close.connection(con)
> tmp
   A   B
1  1 NaN
2 NA   2
> class(tmp[,1])
[1] "numeric"
> class(tmp[,2])
[1] "character"
> tmp[,2]
[1] "NaN" "2"


Bert Gunter

"The trouble with having an open mind is that people keep coming along and 
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Oct 23, 2019 at 6:31 PM Sebastien Bihorel via R-help 
mailto:r-help@r-project.org>> wrote:
Hi,

Is there a way to make read.table consider NaN as a string of characters rather 
than the internal NaN? Changing the na.strings argument does not seems to have 
any effect on how R interprets the NaN string (while is does not the the NA 
string)

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', 
stringsAsFactors = FALSE)
close.connection(con)
tmp
class(tmp[,1])
class(tmp[,2])


__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table and NaN

2019-10-23 Thread Sebastien Bihorel via R-help
Hi,

Is there a way to make read.table consider NaN as a string of characters rather 
than the internal NaN? Changing the na.strings argument does not seems to have 
any effect on how R interprets the NaN string (while is does not the the NA 
string)

con <- textConnection(object = 'A,B\n1,NaN\nNA,2')
tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', 
stringsAsFactors = FALSE)
close.connection(con)
tmp
class(tmp[,1])
class(tmp[,2])


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.