Re: [R] Nonlinear logistic regression fitting
Thanks Duncan, (Sorry for the repeated email) People working in my field are frequently (and rightly) accused of butchering statistical terminology. So I guess I'm guilty as charged I will look into the suggested path. One question though in your expression of loglik, p is "a + b*x/(c+x)". Correct? Thanks From: Duncan Murdoch Sent: Wednesday, July 29, 2020 16:04 To: Sebastien Bihorel ; J C Nash ; r-help@r-project.org Subject: Re: [R] Nonlinear logistic regression fitting Just a quick note about jargon: you are using the word "likelihood" in a way that I (and maybe some others) find confusing. (In fact, I think you used it two different ways, but maybe I'm just confused.) I would say that likelihood is the probability of observing the entire data set, considered as a function of the parameters. You appear to be using it (at first) as the probability that a particular observation is equal to 1, and then as the argument to a logit function to give that probability. What you probably want to do is find the parameters that maximize the likelihood (in my sense). The usual practice is to maximize the log of the likelihood; it tends to be easier to work with. In your notation below, the log likelihood would be loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) When you have a linear logistic regression model, this simplifies a bit, and there are fast algorithms that are usually stable to optimize it. With a nonlinear model, you lose some of that, and I'd suggest directly optimizing it. Duncan Murdoch On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote: > Thank your, Pr. Nash, for your perspective on the issue. > > Here is an example of binary data/response (resp) that were simulated and > re-estimated assuming a non linear effect of the predictor (x) on the > likelihood of response. For re-estimation, I have used gnlm::bnlr for the > logistic regression. The accuracy of the parameter estimates is so-so, > probably due to the low number of data points (8*nx). For illustration, I > have also include a glm call to an incorrect linear model of x. > > #install.packages(gnlm) > #require(gnlm) > set.seed(12345) > > nx <- 10 > x <- c( >rep(0, 3*nx), >rep(c(10, 30, 100, 500, 1000), each = nx) > ) > rnd <- runif(length(x)) > a <- log(0.2/(1-0.2)) > b <- log(0.7/(1-0.7)) - a > c <- 30 > likelihood <- a + b*x/(c+x) > p <- exp(likelihood) / (1 + exp(likelihood)) > resp <- ifelse(rnd <= p, 1, 0) > > df <- data.frame( >x = x, >resp = resp, >nresp = 1- resp > ) > > head(df) > > # glm can only assume linear effect of x, which is the wrong model > glm_mod <- glm( >resp~x, >data = df, >family = 'binomial' > ) > glm_mod > > # Using gnlm package, estimate a model model with just intercept, and a model > with predictor effect > int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) ) > emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a + > p_b*x/(p_c+x), pmu = c(a, b, c) ) > > int_mod > emax_mod > > > From: J C Nash > Sent: Tuesday, July 28, 2020 14:16 > To: Sebastien Bihorel ; > r-help@r-project.org > Subject: Re: [R] Nonlinear logistic regression fitting > > There is a large literature on nonlinear logistic models and similar > curves. Some of it is referenced in my 2014 book Nonlinear Parameter > Optimization Using R Tools, which mentions nlxb(), now part of the > nlsr package. If useful, I could put the Bibtex refs for that somewhere. > > nls() is now getting long in the tooth. It has a lot of flexibility and > great functionality, but it did very poorly on the Hobbs problem that > rather forced me to develop the codes that are 3/5ths of optim() and > also led to nlsr etc. The Hobbs problem dated from 1974, and with only > 12 data points still defeats a majority of nonlinear fit programs. > nls() poops out because it has no LM stabilization and a rather weak > forward difference derivative approximation. nlsr tries to generate > analytic derivatives, which often help when things are very badly scaled. > > Another posting suggests an example problem i.e., some data and a > model, though you also need the loss function (e.g., Max likelihood, > weights, etc.). Do post some data and functions so we can provide more > focussed advice. > > JN > > On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote: >> Hi >> >> I need to fit a logistic regression model using a saturable Michaelis-Menten >> function of my predictor x. The likelihood could be expressed as: >> >> L = intercept + emax * x / (EC50+x) >> >> Whi
Re: [R] Nonlinear logistic regression fitting
Thanks Duncan, People working in my field are frequently (and rightly) accused of butchering statistical terminology. So I guess I'm guilty as charged I will look into the suggested path. One question though in your expression: loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) a + b*x/(c+x) From: Duncan Murdoch Sent: Wednesday, July 29, 2020 16:04 To: Sebastien Bihorel ; J C Nash ; r-help@r-project.org Subject: Re: [R] Nonlinear logistic regression fitting Just a quick note about jargon: you are using the word "likelihood" in a way that I (and maybe some others) find confusing. (In fact, I think you used it two different ways, but maybe I'm just confused.) I would say that likelihood is the probability of observing the entire data set, considered as a function of the parameters. You appear to be using it (at first) as the probability that a particular observation is equal to 1, and then as the argument to a logit function to give that probability. What you probably want to do is find the parameters that maximize the likelihood (in my sense). The usual practice is to maximize the log of the likelihood; it tends to be easier to work with. In your notation below, the log likelihood would be loglik <- sum( resp*log(p) + (1-resp)*log1p(-p) ) When you have a linear logistic regression model, this simplifies a bit, and there are fast algorithms that are usually stable to optimize it. With a nonlinear model, you lose some of that, and I'd suggest directly optimizing it. Duncan Murdoch On 29/07/2020 8:56 a.m., Sebastien Bihorel via R-help wrote: > Thank your, Pr. Nash, for your perspective on the issue. > > Here is an example of binary data/response (resp) that were simulated and > re-estimated assuming a non linear effect of the predictor (x) on the > likelihood of response. For re-estimation, I have used gnlm::bnlr for the > logistic regression. The accuracy of the parameter estimates is so-so, > probably due to the low number of data points (8*nx). For illustration, I > have also include a glm call to an incorrect linear model of x. > > #install.packages(gnlm) > #require(gnlm) > set.seed(12345) > > nx <- 10 > x <- c( >rep(0, 3*nx), >rep(c(10, 30, 100, 500, 1000), each = nx) > ) > rnd <- runif(length(x)) > a <- log(0.2/(1-0.2)) > b <- log(0.7/(1-0.7)) - a > c <- 30 > likelihood <- a + b*x/(c+x) > p <- exp(likelihood) / (1 + exp(likelihood)) > resp <- ifelse(rnd <= p, 1, 0) > > df <- data.frame( >x = x, >resp = resp, >nresp = 1- resp > ) > > head(df) > > # glm can only assume linear effect of x, which is the wrong model > glm_mod <- glm( >resp~x, >data = df, >family = 'binomial' > ) > glm_mod > > # Using gnlm package, estimate a model model with just intercept, and a model > with predictor effect > int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) ) > emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a + > p_b*x/(p_c+x), pmu = c(a, b, c) ) > > int_mod > emax_mod > > > From: J C Nash > Sent: Tuesday, July 28, 2020 14:16 > To: Sebastien Bihorel ; > r-help@r-project.org > Subject: Re: [R] Nonlinear logistic regression fitting > > There is a large literature on nonlinear logistic models and similar > curves. Some of it is referenced in my 2014 book Nonlinear Parameter > Optimization Using R Tools, which mentions nlxb(), now part of the > nlsr package. If useful, I could put the Bibtex refs for that somewhere. > > nls() is now getting long in the tooth. It has a lot of flexibility and > great functionality, but it did very poorly on the Hobbs problem that > rather forced me to develop the codes that are 3/5ths of optim() and > also led to nlsr etc. The Hobbs problem dated from 1974, and with only > 12 data points still defeats a majority of nonlinear fit programs. > nls() poops out because it has no LM stabilization and a rather weak > forward difference derivative approximation. nlsr tries to generate > analytic derivatives, which often help when things are very badly scaled. > > Another posting suggests an example problem i.e., some data and a > model, though you also need the loss function (e.g., Max likelihood, > weights, etc.). Do post some data and functions so we can provide more > focussed advice. > > JN > > On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote: >> Hi >> >> I need to fit a logistic regression model using a saturable Michaelis-Menten >> function of my predictor x. The likelihood could be expressed as: >> >> L = intercept + emax * x / (EC50+x) >> >> Which I guess could be express
Re: [R] Nonlinear logistic regression fitting
Thank your, Pr. Nash, for your perspective on the issue. Here is an example of binary data/response (resp) that were simulated and re-estimated assuming a non linear effect of the predictor (x) on the likelihood of response. For re-estimation, I have used gnlm::bnlr for the logistic regression. The accuracy of the parameter estimates is so-so, probably due to the low number of data points (8*nx). For illustration, I have also include a glm call to an incorrect linear model of x. #install.packages(gnlm) #require(gnlm) set.seed(12345) nx <- 10 x <- c( rep(0, 3*nx), rep(c(10, 30, 100, 500, 1000), each = nx) ) rnd <- runif(length(x)) a <- log(0.2/(1-0.2)) b <- log(0.7/(1-0.7)) - a c <- 30 likelihood <- a + b*x/(c+x) p <- exp(likelihood) / (1 + exp(likelihood)) resp <- ifelse(rnd <= p, 1, 0) df <- data.frame( x = x, resp = resp, nresp = 1- resp ) head(df) # glm can only assume linear effect of x, which is the wrong model glm_mod <- glm( resp~x, data = df, family = 'binomial' ) glm_mod # Using gnlm package, estimate a model model with just intercept, and a model with predictor effect int_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a, pmu = c(a) ) emax_mod <- gnlm::bnlr( y = df[,2:3], link = 'logit', mu = ~ p_a + p_b*x/(p_c+x), pmu = c(a, b, c) ) int_mod emax_mod From: J C Nash Sent: Tuesday, July 28, 2020 14:16 To: Sebastien Bihorel ; r-help@r-project.org Subject: Re: [R] Nonlinear logistic regression fitting There is a large literature on nonlinear logistic models and similar curves. Some of it is referenced in my 2014 book Nonlinear Parameter Optimization Using R Tools, which mentions nlxb(), now part of the nlsr package. If useful, I could put the Bibtex refs for that somewhere. nls() is now getting long in the tooth. It has a lot of flexibility and great functionality, but it did very poorly on the Hobbs problem that rather forced me to develop the codes that are 3/5ths of optim() and also led to nlsr etc. The Hobbs problem dated from 1974, and with only 12 data points still defeats a majority of nonlinear fit programs. nls() poops out because it has no LM stabilization and a rather weak forward difference derivative approximation. nlsr tries to generate analytic derivatives, which often help when things are very badly scaled. Another posting suggests an example problem i.e., some data and a model, though you also need the loss function (e.g., Max likelihood, weights, etc.). Do post some data and functions so we can provide more focussed advice. JN On 2020-07-28 10:13 a.m., Sebastien Bihorel via R-help wrote: > Hi > > I need to fit a logistic regression model using a saturable Michaelis-Menten > function of my predictor x. The likelihood could be expressed as: > > L = intercept + emax * x / (EC50+x) > > Which I guess could be expressed as the following R model > > ~ emax*x/(ec50+x) > > As far as I know (please, correct me if I am wrong), fitting such a model is > to not doable with glm, since the function is not linear. > > A Stackoverflow post recommends the bnlr function from the gnlm > (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... > I would be grateful for any opinion on this package or for any alternative > recommendation of package/function. > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear logistic regression fitting
Hi Rui, Thanks for your input. In my analysis, the MM model is not intended to fit continuous data but must be used within a logistic regression model of binary data. So, while useful in itself, the suggested example does not exactly apply. I appreciate your time From: Rui Barradas Sent: Tuesday, July 28, 2020 12:42 To: Sebastien Bihorel ; r-help@r-project.org Subject: Re: [R] Nonlinear logistic regression fitting Hello, glm might not be the right tool for the MM model but nls is meant to fit non-linear models. And, after an on-line search, there is also package drc, function drm. I will use the data and examples in the links below. (The second gave me right, it uses nls.) #install.packages("drc") library(drc) #--- data # substrate S <- c(0,1,2,5,8,12,30,50) # reaction rate v <- c(0,11.1,25.4,44.8,54.5,58.2,72.0,60.1) kinData <- data.frame(S, v) #--- package drc fit # use the two parameter MM model (MM.2) drm_fit <- drm(v ~ S, data = kinData, fct = MM.2()) #--- nls fit MMcurve <- formula(v ~ Vmax*S/(Km + S)) nls_fit <- nls(MMcurve, kinData, start = list(Vmax = 50, Km = 2)) coef(drm_fit) coef(nls_fit) #--- plot SconcRange <- seq(0, 50, 0.1) nls_Line <- predict(nls_fit, list(S = SconcRange)) plot(drm_fit, log = '', pch = 17, col = "red", main = "Fitted MM curve") lines(SconcRange, nls_Line, col = "blue", lty = "dotted") [1] https://davetang.org/muse/2013/05/17/fitting-a-michaelis-mentens-curve-using/ [2] http://rforbiochemists.blogspot.com/2015/05/plotting-and-fitting-enzymology-data.html Hope this helps, Rui Barradas �s 15:13 de 28/07/2020, Sebastien Bihorel via R-help escreveu: > Hi > > I need to fit a logistic regression model using a saturable Michaelis-Menten > function of my predictor x. The likelihood could be expressed as: > > L = intercept + emax * x / (EC50+x) > > Which I guess could be expressed as the following R model > > ~ emax*x/(ec50+x) > > As far as I know (please, correct me if I am wrong), fitting such a model is > to not doable with glm, since the function is not linear. > > A Stackoverflow post recommends the bnlr function from the gnlm > (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... > I would be grateful for any opinion on this package or for any alternative > recommendation of package/function. > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Este e-mail foi verificado em termos de v�rus pelo software antiv�rus Avast. https://www.avast.com/antivirus [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear logistic regression fitting
I hardly see how your reply addressed my question or any part of it. It looks to me that it was simply assumed that I did not perform any search before posting. From: Bert Gunter Sent: Tuesday, July 28, 2020 11:30 To: Sebastien Bihorel Cc: r-help@r-project.org Subject: Re: [R] Nonlinear logistic regression fitting You said: "As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear." My reply responded to that. AFAIK, opinions on packages are off topic here. Try stats.stackexchange.com<http://stats.stackexchange.com> for that. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jul 28, 2020 at 8:19 AM Sebastien Bihorel mailto:sebastien.biho...@cognigencorp.com>> wrote: Thank you for your subtle input, Bert... as usual! This is literally the search I conducted and spent 2 hours on before posting to R-help. I was asking for expert opinions, not for search engine FAQ! Thank anyways From: Bert Gunter mailto:bgunter.4...@gmail.com>> Sent: Tuesday, July 28, 2020 11:12 To: Sebastien Bihorel mailto:sebastien.biho...@cognigencorp.com>> Cc: r-help@r-project.org<mailto:r-help@r-project.org> mailto:r-help@r-project.org>> Subject: Re: [R] Nonlinear logistic regression fitting Search! ... for "nonlinear logistic regression" at rseek.org<http://rseek.org>. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jul 28, 2020 at 7:25 AM Sebastien Bihorel via R-help mailto:r-help@r-project.org>> wrote: Hi I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as: L = intercept + emax * x / (EC50+x) Which I guess could be expressed as the following R model ~ emax*x/(ec50+x) As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear. A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... I would be grateful for any opinion on this package or for any alternative recommendation of package/function. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear logistic regression fitting
Thank you for your subtle input, Bert... as usual! This is literally the search I conducted and spent 2 hours on before posting to R-help. I was asking for expert opinions, not for search engine FAQ! Thank anyways From: Bert Gunter Sent: Tuesday, July 28, 2020 11:12 To: Sebastien Bihorel Cc: r-help@r-project.org Subject: Re: [R] Nonlinear logistic regression fitting Search! ... for "nonlinear logistic regression" at rseek.org<http://rseek.org>. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Jul 28, 2020 at 7:25 AM Sebastien Bihorel via R-help mailto:r-help@r-project.org>> wrote: Hi I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as: L = intercept + emax * x / (EC50+x) Which I guess could be expressed as the following R model ~ emax*x/(ec50+x) As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear. A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... I would be grateful for any opinion on this package or for any alternative recommendation of package/function. __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nonlinear logistic regression fitting
Hi I need to fit a logistic regression model using a saturable Michaelis-Menten function of my predictor x. The likelihood could be expressed as: L = intercept + emax * x / (EC50+x) Which I guess could be expressed as the following R model ~ emax*x/(ec50+x) As far as I know (please, correct me if I am wrong), fitting such a model is to not doable with glm, since the function is not linear. A Stackoverflow post recommends the bnlr function from the gnlm (https://stackoverflow.com/questions/45362548/nonlinear-logistic-regression-package-in-r)... I would be grateful for any opinion on this package or for any alternative recommendation of package/function. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating file from raw connection
Thanks Duncan From: Duncan Murdoch Sent: Friday, May 29, 2020 15:36 To: Sebastien Bihorel ; r-help@r-project.org Subject: Re: [R] Creating file from raw connection On 29/05/2020 3:00 p.m., Sebastien Bihorel via R-help wrote: > Hi, > > Let's say I can extract the content of an Excel .xlsx file stored in a > database and store it as raw content in an R object. What would be the proper > way to "create" a .xlsx file and "transfer" the content of this obj into it? > I took the example of an Excel file, but my question would extend to any kind > of binary file. > > Thank you in advance for your input It depends on how the .xlsx was put in to the database and then extracted into R, but if it's just a copy of a file from disk, writeBin() will write it without changes. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating file from raw content
Hi, Let's say I can extract the content of an Excel .xlsx file stored in a database and store it as raw content in an R object. What would be the proper way to "create" a .xlsx file and "transfer" the content of this obj into it? I took the example of an Excel file, but my question would extend to any kind of binary file. Thank you in advance for your input Sebastien __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating file from raw connection
Hi, Let's say I can extract the content of an Excel .xlsx file stored in a database and store it as raw content in an R object. What would be the proper way to "create" a .xlsx file and "transfer" the content of this obj into it? I took the example of an Excel file, but my question would extend to any kind of binary file. Thank you in advance for your input Sebastien [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] POSIX system oddities
Duh !!! Thanks. From: Peter Langfelder Sent: Sunday, March 29, 2020 20:12 To: Sebastien Bihorel Cc: r-help@r-project.org Subject: Re: [R] POSIX system oddities The time has changed from "standard" (EST) to "Daylight saving" (EDT) which shaves off 1 hour. Peter On Sun, Mar 29, 2020 at 5:03 PM Sebastien Bihorel via R-help mailto:r-help@r-project.org>> wrote: Hi, Why is there less number of seconds on 03/10/2019 in the internal POSIX system? The difference between the previous or the next day eems to be exactly 1 hour. I could not find anything in the manuals on CRAN. > dates <- as.POSIXct(sprintf('03/%s/2019',9:12), format = '%m/%d/%Y') > dates [1] "2019-03-09 EST" "2019-03-10 EST" "2019-03-11 EDT" "2019-03-12 EDT" > diff(as.numeric(dates[1:2])) [1] 86400 > diff(as.numeric(dates[2:3])) [1] 82800 > diff(as.numeric(dates[3:4])) [1] 86400 [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] POSIX system oddities
Hi, Why is there less number of seconds on 03/10/2019 in the internal POSIX system? The difference between the previous or the next day eems to be exactly 1 hour. I could not find anything in the manuals on CRAN. > dates <- as.POSIXct(sprintf('03/%s/2019',9:12), format = '%m/%d/%Y') > dates [1] "2019-03-09 EST" "2019-03-10 EST" "2019-03-11 EDT" "2019-03-12 EDT" > diff(as.numeric(dates[1:2])) [1] 86400 > diff(as.numeric(dates[2:3])) [1] 82800 > diff(as.numeric(dates[3:4])) [1] 86400 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can file size affect how na.strings operates in a read.table call?
Thanks Bill and Jeff strip.white did not change the outcomes. However, your inputs led me to compare the raw content of the files (ie, outside of an IDE) and found difference in how the apparent -99 were stored. In the big file, some -99 are stored as floats rather than integers and thus included a decimal point and trailing zeros. The creation of the smaller files resulted in the removal of the decimal point and trailing zeros, explaining why read.table provided the "right " response on these smaller files. So, it looks like this is the problem and that some additional post-processing may be warranted. Thanks for the hints. From: William Dunlap Sent: Thursday, November 14, 2019 11:51 To: Jeff Newmiller Cc: Sebastien Bihorel ; r-help@r-project.org Subject: Re: [R] Can file size affect how na.strings operates in a read.table call? read.table (and friends) also have the strip.white argument: > s <- "A,B,C\n0,0,0\n1,-99,-99\n2,-99 ,-99\n3, -99, -99\n" > read.csv(text=s, header=TRUE, na.strings="-99", strip.white=TRUE) A B C 1 0 0 0 2 1 NA NA 3 2 NA NA 4 3 NA NA > read.csv(text=s, header=TRUE, na.strings="-99", strip.white=FALSE) A B C 1 0 0 0 2 1 NA NA 3 2 -99 NA 4 3 -99 -99 Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Thu, Nov 14, 2019 at 8:35 AM Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>> wrote: Consider the following sample: # s <- "A,B,C 0,0,0 1,-99,-99 2,-99 ,-99 3, -99, -99 " dta_notok <- read.csv( text = s , header=TRUE , na.strings = c( "-99", "" ) ) dta_ok <- read.csv( text = s , header=TRUE , na.strings = c( "-99", " -99" , "-99 ", "" ) ) library(data.table) fdt_ok <- fread( text = s, na.strings=c( "-99", "" ) ) fdta_ok <- as.data.frame( fdt_ok ) # Leading and trailing spaces cause problems. The data.table::fread function has a strip.white argument that defaults to TRUE, but the resulting object is a data.table which has different semantics than a data.frame. On Thu, 14 Nov 2019, Sebastien Bihorel wrote: > The data file is a csv file. Some text variables contain spaces. > > "Check for extraneous spaces" > Are there specific locations that would be more critical than others? > > > ____________ > From: Jeff Newmiller > mailto:jdnew...@dcn.davis.ca.us>> > Sent: Thursday, November 14, 2019 10:52 > To: Sebastien Bihorel > mailto:sebastien.biho...@cognigencorp.com>>; > Sebastien > Bihorel via R-help mailto:r-help@r-project.org>>; > r-help@r-project.org<mailto:r-help@r-project.org> > mailto:r-help@r-project.org>> > Subject: Re: [R] Can file size affect how na.strings operates in a > read.table call? > Check for extraneous spaces. You may need more variations of the na.strings. > > On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help > mailto:r-help@r-project.org>> wrote: > >Hi, > > > >I have this generic function to read ASCII data files. It is > >essentially a wrapper around the read.table function. My function is > >used in a large variety of situations and has no a priori knowledge > >about the data file it is asked to read. Nothing is known about file > >size, variable types, variable names, or data table dimensions. > > > >One argument of my function is na.strings which is passed down to > >read.table. > > > >Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by > >~ 160 columns) using na.strings = c('-99', '.') with the intention of > >interpreting '.' and '-99' > >strings as the internal missing data NA. Dots were converted to NA > >appropriately. However, not all -99 values in the data were interpreted > >as NA. In some variables, -99 were converted to NA, while in others -99 > >was read as a number. More surprisingly, when the data file was cut in > >smaller chunks (ie, by dropping either rows or columns) saved in > >multiple files, the function calls applied on the new data files > >resulted in the correct conversion of the -99 values into NAs. > > > >In all cases, the data frames produced by read.table contained the > >expected number of records. > > > >While, on face value, it appears that file size affects how the > >na.strings argument operates, I wondering if there is something else at > >play here. > > > >Unfortunately, I cannot share th
Re: [R] Can file size affect how na.strings operates in a read.table call?
The data file is a csv file. Some text variables contain spaces. "Check for extraneous spaces" Are there specific locations that would be more critical than others? From: Jeff Newmiller Sent: Thursday, November 14, 2019 10:52 To: Sebastien Bihorel ; Sebastien Bihorel via R-help ; r-help@r-project.org Subject: Re: [R] Can file size affect how na.strings operates in a read.table call? Check for extraneous spaces. You may need more variations of the na.strings. On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help wrote: >Hi, > >I have this generic function to read ASCII data files. It is >essentially a wrapper around the read.table function. My function is >used in a large variety of situations and has no a priori knowledge >about the data file it is asked to read. Nothing is known about file >size, variable types, variable names, or data table dimensions. > >One argument of my function is na.strings which is passed down to >read.table. > >Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by >~ 160 columns) using na.strings = c('-99', '.') with the intention of >interpreting '.' and '-99' >strings as the internal missing data NA. Dots were converted to NA >appropriately. However, not all -99 values in the data were interpreted >as NA. In some variables, -99 were converted to NA, while in others -99 >was read as a number. More surprisingly, when the data file was cut in >smaller chunks (ie, by dropping either rows or columns) saved in >multiple files, the function calls applied on the new data files >resulted in the correct conversion of the -99 values into NAs. > >In all cases, the data frames produced by read.table contained the >expected number of records. > >While, on face value, it appears that file size affects how the >na.strings argument operates, I wondering if there is something else at >play here. > >Unfortunately, I cannot share the data file for confidentiality reason >but was wondering if you could suggest some checks I could perform to >get to the bottom on this issue. > >Thank you in advance for your help and sorry for the lack of >reproducible example. > > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can file size affect how na.strings operates in a read.table call?
Hi, I have this generic function to read ASCII data files. It is essentially a wrapper around the read.table function. My function is used in a large variety of situations and has no a priori knowledge about the data file it is asked to read. Nothing is known about file size, variable types, variable names, or data table dimensions. One argument of my function is na.strings which is passed down to read.table. Recently, a user tried to read a data file of ~ 80 Mo (~ 93000 rows by ~ 160 columns) using na.strings = c('-99', '.') with the intention of interpreting '.' and '-99' strings as the internal missing data NA. Dots were converted to NA appropriately. However, not all -99 values in the data were interpreted as NA. In some variables, -99 were converted to NA, while in others -99 was read as a number. More surprisingly, when the data file was cut in smaller chunks (ie, by dropping either rows or columns) saved in multiple files, the function calls applied on the new data files resulted in the correct conversion of the -99 values into NAs. In all cases, the data frames produced by read.table contained the expected number of records. While, on face value, it appears that file size affects how the na.strings argument operates, I wondering if there is something else at play here. Unfortunately, I cannot share the data file for confidentiality reason but was wondering if you could suggest some checks I could perform to get to the bottom on this issue. Thank you in advance for your help and sorry for the lack of reproducible example. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table and NaN
My bad, Bert My point is that my function/framework has very minimal expectations about the source data (mostly, that it is a rectangular shape table of data separated by some separator) and does not have any a-priori knowledge about what the first, second, etc columns in the data files must contain so while it would be possible to pass down some class vector which would be passed down as the colClasses argument to read.table, it is not necessarily reasonable in the context of the overall framework. I guess I was surprised that read.table interprets NaN in an input file as the internal "Not a number" rather than as a string... there is nothing in the ?read.table about that. Anyways, as I said, I need to think more about this in the context of the framework where this function operates... Thanks for the input From: Bert Gunter Sent: Thursday, October 24, 2019 10:39 To: Sebastien Bihorel Cc: r-help@r-project.org Subject: Re: [R] read.table and NaN Not so. Read ?read.table carefully. You can use "NA" as a default. Moreover, you **specified** that you want NaN read as character, which means that any column containing NaN **must** be character. That's part of the specification for data frames (all columns must be one data type). So either change your specfication or change your data structure. And, incidentally, my first name is "Bert" . Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Oct 24, 2019 at 6:43 AM Sebastien Bihorel mailto:sebastien.biho...@cognigencorp.com>> wrote: Thanks Gunter It seems that one has to know the structure of the data and adapt the read.table call accordingly. I am working on a framework that is meant to process data files with unknown structure, so I have to think a bit more about that... From: Bert Gunter mailto:bgunter.4...@gmail.com>> Sent: Thursday, October 24, 2019 00:08 To: Sebastien Bihorel mailto:sebastien.biho...@cognigencorp.com>> Cc: r-help@r-project.org<mailto:r-help@r-project.org> mailto:r-help@r-project.org>> Subject: Re: [R] read.table and NaN Like this? con <- textConnection(object = 'A,B\n1,NaN\nNA,2') > tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', > stringsAsFactors = FALSE, + colClasses = c("numeric", "character")) > close.connection(con) > tmp A B 1 1 NaN 2 NA 2 > class(tmp[,1]) [1] "numeric" > class(tmp[,2]) [1] "character" > tmp[,2] [1] "NaN" "2" Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Oct 23, 2019 at 6:31 PM Sebastien Bihorel via R-help mailto:r-help@r-project.org>> wrote: Hi, Is there a way to make read.table consider NaN as a string of characters rather than the internal NaN? Changing the na.strings argument does not seems to have any effect on how R interprets the NaN string (while is does not the the NA string) con <- textConnection(object = 'A,B\n1,NaN\nNA,2') tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', stringsAsFactors = FALSE) close.connection(con) tmp class(tmp[,1]) class(tmp[,2]) __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table and NaN
Thanks Gunter It seems that one has to know the structure of the data and adapt the read.table call accordingly. I am working on a framework that is meant to process data files with unknown structure, so I have to think a bit more about that... From: Bert Gunter Sent: Thursday, October 24, 2019 00:08 To: Sebastien Bihorel Cc: r-help@r-project.org Subject: Re: [R] read.table and NaN Like this? con <- textConnection(object = 'A,B\n1,NaN\nNA,2') > tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', > stringsAsFactors = FALSE, + colClasses = c("numeric", "character")) > close.connection(con) > tmp A B 1 1 NaN 2 NA 2 > class(tmp[,1]) [1] "numeric" > class(tmp[,2]) [1] "character" > tmp[,2] [1] "NaN" "2" Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Oct 23, 2019 at 6:31 PM Sebastien Bihorel via R-help mailto:r-help@r-project.org>> wrote: Hi, Is there a way to make read.table consider NaN as a string of characters rather than the internal NaN? Changing the na.strings argument does not seems to have any effect on how R interprets the NaN string (while is does not the the NA string) con <- textConnection(object = 'A,B\n1,NaN\nNA,2') tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', stringsAsFactors = FALSE) close.connection(con) tmp class(tmp[,1]) class(tmp[,2]) __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table and NaN
Hi, Is there a way to make read.table consider NaN as a string of characters rather than the internal NaN? Changing the na.strings argument does not seems to have any effect on how R interprets the NaN string (while is does not the the NA string) con <- textConnection(object = 'A,B\n1,NaN\nNA,2') tmp <- read.table(con, header = TRUE, sep = ',', na.strings = '', stringsAsFactors = FALSE) close.connection(con) tmp class(tmp[,1]) class(tmp[,2]) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.