Re: [R] Best way to merge 300+ .5MB dataframes?
On Aug 11, 2014, at 8:01 PM, John McKown wrote: On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams tea...@gmail.com wrote: Grant, Assuming all your filenames are something like file1.txt, file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to the directory where your files are located... This will strip off the 1st lines, that is, your header lines: for file in *.txt;do sed -i '1d'${file}; done Then, do this: cat *.txt newfilename.txt Doing both should only take a few seconds, depending on your file sizes. Cheers! Tom Using sed hadn't occurred to me. I guess I'm just awk-ward grin/. A slightly different way would be: for file in *.txt;do sed '1d' ${file} done newfilename.txt that way the original files are not modified. But it strips out the header on the 1st file as well. Not a big deal, but the read.table will need to be changed to accommodate that. Also, it creates an otherwise unnecessary intermediate file newfilename.txt. To get the 1st file's header, the script could: head -1 newfilename.txt for file in *.txt;do sed '1d' ${file} done newfilename.txt I really like having multiple answers to a given problem. Especially since I have a poorly implemented version of awk on one of my systems. It is the vendor's awk and conforms exactly to the POSIX definition with no additions. So I don't have the FNR built-in variable. Your implementation would work well on that system. Well, if there were a version of R for it. It is a branded UNIX system which was designed to be totally __and only__ POSIX compliant, with few (maybe no) extensions at all. IOW, it stinks. No, it can't be replaced. It is the z/OS system from IBM which is EBCDIC based and runs on the big iron mainframe, system z. -- On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to merge 300+ .5MB dataframes?
On 12/08/2014 07:07, David Winsemius wrote: On Aug 11, 2014, at 8:01 PM, John McKown wrote: On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams tea...@gmail.com wrote: Grant, Assuming all your filenames are something like file1.txt, file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to the directory where your files are located... This will strip off the 1st lines, that is, your header lines: for file in *.txt;do sed -i '1d'${file}; done Then, do this: cat *.txt newfilename.txt Doing both should only take a few seconds, depending on your file sizes. Cheers! Tom Using sed hadn't occurred to me. I guess I'm just awk-ward grin/. A slightly different way would be: for file in *.txt;do sed '1d' ${file} done newfilename.txt that way the original files are not modified. But it strips out the header on the 1st file as well. Not a big deal, but the read.table will need to be changed to accommodate that. Also, it creates an otherwise unnecessary intermediate file newfilename.txt. To get the 1st file's header, the script could: head -1 newfilename.txt for file in *.txt;do sed '1d' ${file} done newfilename.txt I really like having multiple answers to a given problem. Especially since I have a poorly implemented version of awk on one of my systems. It is the vendor's awk and conforms exactly to the POSIX definition with no additions. So I don't have the FNR built-in variable. Your implementation would work well on that system. Well, if there were a version of R for it. It is a branded UNIX system which was designed to be totally __and only__ POSIX compliant, with few (maybe no) extensions at all. IOW, it stinks. No, it can't be replaced. It is the z/OS system from IBM which is EBCDIC based and runs on the big iron mainframe, system z. -- On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send. For historical reasons this is actually part of R's configuration: see the AWK entry in R_HOME/etc/Makeconf. (There is an SED entry too: not all sed's in current OSes are POSIX-compliant.) Using system2() rather than system() is recommended for new code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping
Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the fit (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = prediction. Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it. Any hints are highly appreciated. -- Jan Stanstrup Postdoc Metabolomics Food Quality and Nutrition Fondazione Edmund Mach __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] script to data clear
Hello List, I did this script to clear data after import (I dont know is ok ). After its execution levels and label values got lost. Could some explain me to reassign levels again in the script (new depurate value)? Best regard Maicel Monzon MD, PHD Center of Cybernetic Apply to Medicine # data cleaning script library(stringr) for(i in 1:length(data)) { if (is.factor(data[[i]])==T) {for(j in 1:sum(str_detect(data[,i], ))) {data[[i]]-str_replace_all(data[[i]], , )}} data[[i]]-str_trim (data[[i]],side = both) data[[i]]-tolower(data[[i]]) } Note: is 2 blank space and only one -- Nunca digas nunca, di mejor: gracias, permiso, disculpe. Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas Infomed: http://www.sld.cu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multivariate tobit regression
Dear R-users, I would like to run a multivariate tobit model in R. Is there any package available to perform this task? Best regards, Printil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing graphs
Dear Richard and Duncan, your suggestions are absolutely serving what I need. But I would like to see x-axis to be up to 30 instead of 20. Do you have any suggestion on that? Many thanks for your kind help. Regards, Jamil. On 12 August 2014 01:22, Duncan Mackay dulca...@bigpond.com wrote: Hi If you want a 1 package and 1 function approach try this xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d, par.settings = list(strip.background = list(col = transparent)), layout = c(3,1), aspect = 1, type = c(b,g), scales = list(alternating = FALSE), panel = function(x,y,...){ panel.xyplot(x,y,...) # f1-function(x,v,cl,t) # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t), panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30) } ) # par.settings ... if you are publishing show text better # with factor if you want 1:3 omit the levels # has advantage of doing more things than in groupedData as Doug Bates has said Regards Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Naser Jamil Sent: Monday, 11 August 2014 19:06 To: R help Subject: [R] Superimposing graphs Dear R-user, May I seek your help to sort out a little problem. I have the following codes to draw two graphs. I want to superimpose the second one on each of the first one. library(nlme) subject-c(1,1,1,2,2,2,3,3,3) time-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) con.cohort-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, 0.10593292,1.20808375,0.47638394,0.02808967) data.d=data.frame(subject=subject,time=time,conc=con.cohort) grouped.data-groupedData(formula=conc~time | subject, data =data.d) plot(grouped.data) ## f1-function(x,v,cl,t) { (x/v)*exp(-(cl/v)*t) } t-seq(0,30, .01) plot(t,f1(0.5,0.5,0.06,t),type=l,pch=18, ylim=c(), xlab=time, ylab=conc) ### Any suggestion will really be helpful. Regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bugs and misfeatures in polr(MASS).... fixed!
The official maintainers were dismissive when I suggested there were some problems I could fix with the then implementation of polr. I haven't looked at it since, sorry. On Tue, Aug 12, 2014 at 7:44 PM, Guido Biele [via R] ml-node+s789695n4695392...@n4.nabble.com wrote: I modified (where neccessary) the file polr.R of the current MASS package (7.3-33) following the fixes in fixed-polr.R* and it is still working. the original polr.R file had implemented some of Tim's suggestion, but not the new method to generate starting values for the optimization. Does anybody know why polr was only partially fixed? Regards - Guido *http://r.789695.n4.nabble.com/attachment/4647403/0/fixed-polr.R -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4695392.html To unsubscribe from bugs and misfeatures in polr(MASS) fixed!, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3024677code=dGltb3RoeS5iZW5oYW1AdXFjb25uZWN0LmVkdS5hdXwzMDI0Njc3fDE5NTE2NDMxMjk= . NAML http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - Tim J. Benham -- View this message in context: http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4695394.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing graphs
Yes, use xlim=c(0, 30) in your definition of P1 On Tue, Aug 12, 2014 at 7:26 AM, Naser Jamil jamilnase...@gmail.com wrote: Dear Richard and Duncan, your suggestions are absolutely serving what I need. But I would like to see x-axis to be up to 30 instead of 20. Do you have any suggestion on that? Many thanks for your kind help. Regards, Jamil. On 12 August 2014 01:22, Duncan Mackay dulca...@bigpond.com wrote: Hi If you want a 1 package and 1 function approach try this xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d, par.settings = list(strip.background = list(col = transparent)), layout = c(3,1), aspect = 1, type = c(b,g), scales = list(alternating = FALSE), panel = function(x,y,...){ panel.xyplot(x,y,...) # f1-function(x,v,cl,t) # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t), panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30) } ) # par.settings ... if you are publishing show text better # with factor if you want 1:3 omit the levels # has advantage of doing more things than in groupedData as Doug Bates has said Regards Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Naser Jamil Sent: Monday, 11 August 2014 19:06 To: R help Subject: [R] Superimposing graphs Dear R-user, May I seek your help to sort out a little problem. I have the following codes to draw two graphs. I want to superimpose the second one on each of the first one. library(nlme) subject-c(1,1,1,2,2,2,3,3,3) time-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) con.cohort-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, 0.10593292,1.20808375,0.47638394,0.02808967) data.d=data.frame(subject=subject,time=time,conc=con.cohort) grouped.data-groupedData(formula=conc~time | subject, data =data.d) plot(grouped.data) ## f1-function(x,v,cl,t) { (x/v)*exp(-(cl/v)*t) } t-seq(0,30, .01) plot(t,f1(0.5,0.5,0.06,t),type=l,pch=18, ylim=c(), xlab=time, ylab=conc) ### Any suggestion will really be helpful. Regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to merge 300+ .5MB dataframes?
Thank you all kindly. Grant Rettke | ACM, AMA, COG, IEEE gret...@acm.org | http://www.wisdomandwonder.com/ “Wisdom begins in wonder.” --Socrates ((λ (x) (x x)) (λ (x) (x x))) “Life has become immeasurably better since I have been forced to stop taking it seriously.” --Thompson On Tue, Aug 12, 2014 at 1:07 AM, David Winsemius dwinsem...@comcast.net wrote: On Aug 11, 2014, at 8:01 PM, John McKown wrote: On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams tea...@gmail.com wrote: Grant, Assuming all your filenames are something like file1.txt, file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to the directory where your files are located... This will strip off the 1st lines, that is, your header lines: for file in *.txt;do sed -i '1d'${file}; done Then, do this: cat *.txt newfilename.txt Doing both should only take a few seconds, depending on your file sizes. Cheers! Tom Using sed hadn't occurred to me. I guess I'm just awk-ward grin/. A slightly different way would be: for file in *.txt;do sed '1d' ${file} done newfilename.txt that way the original files are not modified. But it strips out the header on the 1st file as well. Not a big deal, but the read.table will need to be changed to accommodate that. Also, it creates an otherwise unnecessary intermediate file newfilename.txt. To get the 1st file's header, the script could: head -1 newfilename.txt for file in *.txt;do sed '1d' ${file} done newfilename.txt I really like having multiple answers to a given problem. Especially since I have a poorly implemented version of awk on one of my systems. It is the vendor's awk and conforms exactly to the POSIX definition with no additions. So I don't have the FNR built-in variable. Your implementation would work well on that system. Well, if there were a version of R for it. It is a branded UNIX system which was designed to be totally __and only__ POSIX compliant, with few (maybe no) extensions at all. IOW, it stinks. No, it can't be replaced. It is the z/OS system from IBM which is EBCDIC based and runs on the big iron mainframe, system z. -- On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superimposing graphs
That's perfect! Many thanks. On 12 August 2014 14:32, Richard M. Heiberger r...@temple.edu wrote: Yes, use xlim=c(0, 30) in your definition of P1 On Tue, Aug 12, 2014 at 7:26 AM, Naser Jamil jamilnase...@gmail.com wrote: Dear Richard and Duncan, your suggestions are absolutely serving what I need. But I would like to see x-axis to be up to 30 instead of 20. Do you have any suggestion on that? Many thanks for your kind help. Regards, Jamil. On 12 August 2014 01:22, Duncan Mackay dulca...@bigpond.com wrote: Hi If you want a 1 package and 1 function approach try this xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d, par.settings = list(strip.background = list(col = transparent)), layout = c(3,1), aspect = 1, type = c(b,g), scales = list(alternating = FALSE), panel = function(x,y,...){ panel.xyplot(x,y,...) # f1-function(x,v,cl,t) # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t), panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30) } ) # par.settings ... if you are publishing show text better # with factor if you want 1:3 omit the levels # has advantage of doing more things than in groupedData as Doug Bates has said Regards Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Naser Jamil Sent: Monday, 11 August 2014 19:06 To: R help Subject: [R] Superimposing graphs Dear R-user, May I seek your help to sort out a little problem. I have the following codes to draw two graphs. I want to superimpose the second one on each of the first one. library(nlme) subject-c(1,1,1,2,2,2,3,3,3) time-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0) con.cohort-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282, 0.10593292,1.20808375,0.47638394,0.02808967) data.d=data.frame(subject=subject,time=time,conc=con.cohort) grouped.data-groupedData(formula=conc~time | subject, data =data.d) plot(grouped.data) ## f1-function(x,v,cl,t) { (x/v)*exp(-(cl/v)*t) } t-seq(0,30, .01) plot(t,f1(0.5,0.5,0.06,t),type=l,pch=18, ylim=c(), xlab=time, ylab=conc) ### Any suggestion will really be helpful. Regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] script to data clear
Without a representative sample of data, it is very hard to understand your question or to be specific about suggestions. See [1] for some ideas about how to communicate questions online. Not that clearing data would usually mean deleting it, as in rm(data). From context I assume you mean cleaning, where invalid characters need to be removed. Also assuming that you have a data frame with some columns that are categorical data: 1) If the values are contaminated or incomplete (don't have rows representing every possible category) then it is almost always better to delay converting to factor until after data are cleaned. The read.table family of functions include a stringsAsFactors=FALSE option that will prevent automatic conversion of columns with unknown types into factors. This is also useful for contaminated numeric columns. Only after the vector of character data is clean and as complete as it can be should you convert to factor. Note that most data sets have a variety of column types, and even after resolving issues discussed here your function is not necessarily going to work with every input data file that you encounter. Specifically, not every column of data should be converted to factor. With this in mind, it can be helpful to look for ways to confirm that the date you are processing is what you expect it to be. Often this is implemented by confirming that specific columns have specific kinds of data in them. That is using a loop may be TOO flexible... apply this cleaning loop cautiously. 2) Most functions in R can process whole vectors of data at once, so your inner loop should not be necessary. Specifically, the line data[[i]] - gsub( +, , data[[i]] ) would replace all sequences of one or more spaces in every element of the vector with a single space. (Your j loop also goes too many times... str_replace_all(data[[i]], , ) is affecting the whole column, but you repeat it unnecessarily.) 3) I don't know what a depurate value is. 4) You should be able to convert your cleaned character column to factor with the factor function... like data[[i]] - factor( data[[i]] ) Note that if you know certain levels should be possible but not all of them are actually present (e.g. Small, Medium, and Large but no data with Small are present) then you will need to specify the levels as a parameter to the factor function. See the help file ?factor. 5) You have several lines of code at the end that appear to execute regardless of whether the column is a factor or not. They should be within the braces of the if statement. 6) Please read the Posting Guide mentioned at the end of this and every post on this list, specifically regarding posting in plain text. Your code was partially damaged by the HTML email format. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 12, 2014 5:42:13 AM PDT, Maicel Monzón Pérez mai...@infomed.sld.cu wrote: Hello List, I did this script to clear data after import (I don�t know is ok ). After its execution levels and label values got lost. Could some explain me to reassign levels again in the script (new depurate value)? Best regard Maicel Monzon MD, PHD Center of Cybernetic Apply to Medicine # data cleaning script library(stringr) for(i in 1:length(data)) { if (is.factor(data[[i]])==T) {for(j in 1:sum(str_detect(data[,i], ))) {data[[i]]-str_replace_all(data[[i]], , )}} data[[i]]-str_trim (data[[i]],side = both) data[[i]]-tolower(data[[i]]) } Note: � � is 2 blank space and � � only one -- Nunca digas nunca, di mejor: gracias, permiso, disculpe. Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas Infomed: http://www.sld.cu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping
On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote: Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the fit (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = prediction. Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it. Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping
PI's of what? -- future individual values or mean values? I assume quantreg provides quantiles for the latter, not the former. (See ?predict.lm for a terse explanation of the difference). Both are obtainable from bootstrapping but the details depend on what you are prepared to assume. Consult references or your local statistician for help if needed. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius dwinsem...@comcast.net wrote: On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote: Hi, I am trying to find a way to estimate prediction intervals (PI) for a monotonic loess curve using bootstrapping. At the moment my approach is to use the boot function from the boot package to bootstrap my loess model, which consist of loess + monoproc from the monoproc package (to force the fit to be monotonic which gives me much improved results with my particular data). The output from the monoproc package is simply the fitted y values at each x-value. I then use boot.ci (again from the boot package) to get confidence intervals. The problem is that this gives me confidence intervals (CI) for the fit (is there a proper way to specify this?) and not a prediction interval. The interval is thus way too optimistic to give me an idea of the confidence interval of a predicted value. For linear models predict.lm can give PI instead of CI by setting interval = prediction. Further discussion of that here: http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping. However I don't see a way to do that for boot.ci. Does there exist a way to get PIs after bootstrapping? If some sample code is required I am more than happy to supply it but I thought the question was general enough to be understandable without it. Why not use the quantreg package to estimate the quantiles of interest to you? That way you would not be depending on Normal theory assumptions which you apparently don't trust. I've used it with the `cobs` function from the package of the same name to implement the monotonic constraint. I think there is a worked example in the quantreg package, but since I bought Koenker's book, I may be remembering from there. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pass vector binding to DBI parameter (rsqlite)
Hi, is there a way to bind vectors to DBI query parameters? The following tells me that vectors are sent as separate values: library(RSQLite) c - dbConnect (SQLite()) dbGetQuery(c, create table tst (x int, y int)) dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2), y=c(3, 4, 5, 6))) dbReadTable(c, tst) x y 1 1 3 2 2 4 3 1 5 4 2 6 dbGetQuery(c, select * from tst where y not in (?), c(7,6)) x y 1 1 3 2 2 4 3 1 5 4 2 6 5 1 3 6 2 4 7 1 5 This looks like 2 result sets (4 + 3 entries), not one. Is there to send multiple values to a '?' binding? Is this at all possible using the R DBI interface (not necessarily with rsqlite)? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to process multiple data files using R loop
Thank you very much for all replies:) Here is my working code: for(i in ls(pattern=P_)){print(head(get(i),2))} On Monday, August 11, 2014 11:04 AM, Greg Snow 538...@gmail.com wrote: In addition to the solution and comments that you have already received, here are a couple of additional comments: This is a variant on FAQ 7.21, if you had found that FAQ then it would have told you about the get function. The most important part of the answer in FAQ 7.21 is the last part where it says that it is better to use a list.� If all the objects of interest are related and you want to do the same or similar things to each one, then having them all stored in a single list can simplify things for the future.� You can collect all the objects into a single list using the mget command, e.g.: P_objects - mget( ls(pattern='P_')) Now that they are in a list you can do the equivalent of your loop, but simpler with the lapply function, e.g.: lapply( P_objects, head, 2 ) And if you want to do other things with all these objects, such as save them, plot them, do a regression analysis on them, delete them, etc. then you can do that using lapply/sapply as well in a simpler way than looping. On Fri, Aug 8, 2014 at 12:25 PM, Fix Ace ace...@rocketmail.com wrote: I have 16 files and would like to check the information of their first two lines, what I did: ls(pattern=P_) � [1] P_3_utr_source_data� � � � � � � P_5_utr_source_data � [3] P_exon_per_gene_cds_source_data� P_exon_per_gene_source_data � [5] P_exon_source_data� � � � � � � � P_first_exon_oncds_source_data � [7] P_first_intron_oncds_source_data� P_first_intron_ongene_source_data � [9] P_firt_exon_ongene_source_data� � P_gene_cds_source_data [11] P_gene_source_data� � � � � � � � P_intron_source_data [13] P_last_exon_oncds_source_data� � P_last_exon_ongene_source_data [15] P_last_intron_oncds_source_data� P_last_intron_ongene_source_data for(i in ls(pattern=P_)){head(i, 2)} It obviously does not work since nothing came out What I would like to see for the output is : head(P_3_utr_source_data,2) � V1 1� 1 2� 1 head(P_5_utr_source_data,2) � V1 1� 1 2� 1 . . . Could anybody help me with this? Thank you very much for your time:) � � � � [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pass vector binding to DBI parameter (rsqlite)
On Tue, Aug 12, 2014 at 10:55 AM, Dan Muresan danm...@gmail.com wrote: Hi, is there a way to bind vectors to DBI query parameters? The following tells me that vectors are sent as separate values: library(RSQLite) c - dbConnect (SQLite()) dbGetQuery(c, create table tst (x int, y int)) dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2), y=c(3, 4, 5, 6))) dbReadTable(c, tst) x y 1 1 3 2 2 4 3 1 5 4 2 6 dbGetQuery(c, select * from tst where y not in (?), c(7,6)) x y 1 1 3 2 2 4 3 1 5 4 2 6 5 1 3 6 2 4 7 1 5 This looks like 2 result sets (4 + 3 entries), not one. Is there to send multiple values to a '?' binding? Is this at all possible using the R DBI interface (not necessarily with rsqlite)? I don't really _know_ much, but what I would try would be something like: dbGetQuery(c,select * from tst where y not in (?),paste(c(7,6),collapse=',')); The paste(c(7,6),collapse=',') results in the string 6,7. You could always subject yourself to a SQL injection attack by doing: dbGetQuery(c,paste(select * from tst where y not in (,c(7,6),),collapse=',')); If you do this and use a variable instead of the c(7,6), make sure you cleanse the contents of the variable. Just as making sure that there is no bare semi-colon in it. And other things that don't come to mind off hand. Hum, perhaps better: values-c(7,6); dbGetQuery(c,paste(select * from tst where y not in (, paste(rep('?',length(values)),collapse=','), )), values); As you can see, this dynamically adjusts the number of ? marks in the SELECT statement, based on the number of elements in the values variable. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generating a sequence of seconds
Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Validation of the Markov chain assumption
Hi, I 'm modelling the occurrence of daily rainfall with a first order Markov chain. I would like to know if there is a statistic test implemented in R that could allow me to asses that the observed rainfall time series verifies the Markov assumption. Thanks P.S. My apologies for cross-posting since I send this question by mistake to an inadequate R mailing list. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] t.test of matching columns from two datasets using plyr
Hi, I Have two datasets df1 and df2 with 3 matching columns. I need to do a t.test of sp1, sp2 and sp3� and var1, var2 and var3 where the year, month and location match. I can do it with sapply or mapply but I want the end result to be a data.frame. I prefer to do it with plyr or dplyr as I have been using these packages throughout this project. My final dataframe should have the t.test statistic and the p.value. � Sample datasets first dataframe df1 - structure(list(Year = c(1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L), month = c(Feb, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Apr, Apr, Apr, Apr, Apr, Apr), location = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 2L, 4L, 4L, 1L, 4L, 4L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 2L), .Label = c(Far West, North, Other, South, West), class = factor), var1 = c(111.6, 0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 31.4, 245.9, 46.3, 59.8, 206.1, 200.3, 88, 73.4, 33.9, 7.1), var2 = c(0, 4.7, 4.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 159.8, 0, 0, 142.2, 94.3, 0, 0, 0, 0), var3 = c(180.2, 14.1, 123.7, 17.4, 5.5, 12.9, 39.3, 21, 66.6, 12.2, 13.6, 15.7, 36.9, 0, 143.5, 35.5, 235.6, 51.3, 230.6, 81.3, 190.9)), .Names = c(Year, month, location, var1, var2, var3), row.names = 17093:17113, class = data.frame) second dataframe df2 - structure(list(Year = c(1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L), month = c(Apr, Apr, Apr, Apr, Apr, Apr, Apr, Apr, May, May, May, May, May, May, May, May, May, May, May, May, May), location = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c(Far West, North, South, West), class = factor), sp1 = c(853.0055629, 147.7158909, 160.1536518, 65.01652491, 2332.609706, 701.4706852, 11.36420842, 0, 2645.671425, 2769.409257, 523.4284249, 135.1274855, 72.22498557, 35.07497333, 572.087043, 150.4768424, 111.5881472, 61.21848041, 392.0651906, 0, 771.0337355), sp2 = c(10.27717546, 0, 0, 0, 0, 10.16624181, 0, 0, 0, 307.7121397, 52.34284249, 19.30392649, 24.07499519, 0, 35.75544018, 42.99338354, 0, 40.81232027, 0, 90.9210806, 622.7580172), sp3 = c(92.49457911, 128.0204387, 203.8319205, 175.5446173, 120.6522262, 71.1636927, 107.95998, 57.14456898, 43.37166271, 153.8560698, 104.685685, 77.21570598, 96.29998075, 187.0665244, 0, 0, 111.5881472, 163.2492811, 26.13767938, 45.4605403, 207.5860057)), .Names = c(Year, month, location, sp1, sp2, sp3), row.names = 30:50, class = data.frame) � Thank you much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Aug 12, 2014, at 1:51 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin Erin, Do you want just the numeric vector of seconds, with the first value being 0, incrementing by 1 to the final value? x - seq(from = as.POSIXct(2014-08-12 00:00:00), to = as.POSIXct(2014-08-12 23:59:59), by = secs) head(x) [1] 2014-08-12 00:00:00 CDT 2014-08-12 00:00:01 CDT [3] 2014-08-12 00:00:02 CDT 2014-08-12 00:00:03 CDT [5] 2014-08-12 00:00:04 CDT 2014-08-12 00:00:05 CDT tail(x) [1] 2014-08-12 23:59:54 CDT 2014-08-12 23:59:55 CDT [3] 2014-08-12 23:59:56 CDT 2014-08-12 23:59:57 CDT [5] 2014-08-12 23:59:58 CDT 2014-08-12 23:59:59 CDT head(as.numeric(x - x[1])) [1] 0 1 2 3 4 5 tail(as.numeric(x - x[1])) [1] 86394 86395 86396 86397 86398 86399 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
What I would like to do is to look at several days and determine activities that happened at times on those days. I don't really care which days, I just care about what time. Thank you! On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
If your activities of interest are mainly during the workday then seconds-since-3am might give good results, avoiding most daylight savings time issues. If they are more biologically oriented then something like seconds before or after sunrise or sunset might be better. Both can be expressed as differences between POSIXct times. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 12:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: What I would like to do is to look at several days and determine activities that happened at times on those days. I don't really care which days, I just care about what time. Thank you! On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Tue, Aug 12, 2014 at 1:51 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin -- Erin Hodgess I'm a bit confused by this request. The definition of a POSIXct is: Class POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. So I don't really know what you mean by the seconds portion. There are 24*60*60 or 86,400 seconds in a day. Those seconds are from +0 at 00:00:00 to +86399 for 23:59:59. Is this what you were asking? seconds_vector -0:86399; #is the simple way to get the above. By the definition given above, there is no such thing as a POSIXct value without a date portion. Any number value will convert to a date+time. Like a timestamp variable in SQL vs. a time variable. If you want to display the seconds_vector as HH:MM:SS for some reason, the simple way is: character_time=sprintf(%02d:%02d:%02d, # C-style formatting string seconds_vector/3600, # hour value (seconds_vector%%3600)/60, #minute value seconds_vector%%60); #second value You can simply make that a function getTimePortion - function(POSIXct_value) { value_in_seconds=as.integer(POSIXct_value); sprintf(%02d:%02d:%02d, # C-style formatting string seconds_vector/3600, # hour value (seconds_vector%%3600)/60, #minute value seconds_vector%%60); #second value }; -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
Erin, Is a sequential resolution of seconds required, as per your original post? If so, then using my approach and specifying the start and end dates and times will work, with the coercion of the resultant vector to numeric as I included. The method I used (subtracting the first value) will also give you the starting second as 0, or you can alter the math to adjust the origin of the vector as you desire. As Bill notes, there will be some days where the number of seconds in the day will be something other than 86,400. In Bill's example, it is due to his choosing the start and end dates of daylight savings time in a relevant time zone. Thus, his second date is short an hour, while the third has an extra hour. Regards, Marc On Aug 12, 2014, at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: What I would like to do is to look at several days and determine activities that happened at times on those days. I don't really care which days, I just care about what time. Thank you! On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Tue, Aug 12, 2014 at 2:40 PM, John McKown john.archie.mck...@gmail.com wrote: snip You can simply make that a function getTimePortion - function(POSIXct_value) { value_in_seconds=as.integer(POSIXct_value); sprintf(%02d:%02d:%02d, # C-style formatting string seconds_vector/3600, # hour value (seconds_vector%%3600)/60, #minute value seconds_vector%%60); #second value }; Sorry, cut'n'pasted that incorrectly getTimePortion - function(POSIXct_value) { value_in_seconds=as.integer(POSIXct_value); sprintf(%02d:%02d:%02d, # C-style value_in_seconds/3600, # hour value (value_in_seconds%%3600)/60, #minute value value_in_seconds_vector%%60); #second value }; And the above is vectorized and will work if argument has multiple values in it. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pass vector binding to DBI parameter (rsqlite)
Yes, of course, that's an obvious work-around, thanks. Another one is to use temporary tables. But I'd like to know if binding a vector to an SQL parameter is possible in rsqlite (or even in the DBI API or with other drivers -- it seems to me it isn't). This seems like a nasty shortcoming (especially in light of SQL injection, but there are other considerations). On 8/12/14, John McKown john.archie.mck...@gmail.com wrote: On Tue, Aug 12, 2014 at 10:55 AM, Dan Muresan danm...@gmail.com wrote: Hi, is there a way to bind vectors to DBI query parameters? The following tells me that vectors are sent as separate values: library(RSQLite) c - dbConnect (SQLite()) dbGetQuery(c, create table tst (x int, y int)) dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2), y=c(3, 4, 5, 6))) dbReadTable(c, tst) x y 1 1 3 2 2 4 3 1 5 4 2 6 dbGetQuery(c, select * from tst where y not in (?), c(7,6)) x y 1 1 3 2 2 4 3 1 5 4 2 6 5 1 3 6 2 4 7 1 5 This looks like 2 result sets (4 + 3 entries), not one. Is there to send multiple values to a '?' binding? Is this at all possible using the R DBI interface (not necessarily with rsqlite)? I don't really _know_ much, but what I would try would be something like: dbGetQuery(c,select * from tst where y not in (?),paste(c(7,6),collapse=',')); The paste(c(7,6),collapse=',') results in the string 6,7. You could always subject yourself to a SQL injection attack by doing: dbGetQuery(c,paste(select * from tst where y not in (,c(7,6),),collapse=',')); If you do this and use a variable instead of the c(7,6), make sure you cleanse the contents of the variable. Just as making sure that there is no bare semi-colon in it. And other things that don't come to mind off hand. Hum, perhaps better: values-c(7,6); dbGetQuery(c,paste(select * from tst where y not in (, paste(rep('?',length(values)),collapse=','), )), values); As you can see, this dynamically adjusts the number of ? marks in the SELECT statement, based on the number of elements in the values variable. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
And some people wonder why I absolutely abhor daylight saving time. I'm not really fond of leap years and leap seconds either. Somebody needs to fix the Earth's rotation and orbit! On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Aug 12, 2014, at 2:49 PM, John McKown john.archie.mck...@gmail.com wrote: And some people wonder why I absolutely abhor daylight saving time. I'm not really fond of leap years and leap seconds either. Somebody needs to fix the Earth's rotation and orbit! I have been a longtime proponent of slowing the rotation of the Earth on its axis, so that we could have longer days to be more productive. Unfortunately, so far, my wish has gone unfulfilled...at least as it is relevant within human lifetimes. ;-) Regards, Marc On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Tue, Aug 12, 2014 at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: What I would like to do is to look at several days and determine activities that happened at times on those days. I don't really care which days, I just care about what time. Thank you! Ah! A light dawns. You want to subset your data based on some part of the time. Such as between 13:23:00 and 15:10:01 of each day in the sample. Ignoring the DST issue, which I shouldn't. It is left as an exercise for the reader. But usually 13:23 is 13*3600+23*60, 48180, seconds after midnight. 15:10:01 is 15*3600+10*60+1, 54601, seconds after midnight. Suppose you have a data.frame() in a variable called myData. Further suppose that the POSIXct variable in this data.frame is called when. You want to subset this into another data.frame() and call it subsetMyData. subsetMyData-myData[as.integer(myData$when)%%86400 = 48180 as.integer(myData$when)%%86400 = 54601,]; Yes, this is ugly. You might make it look nicer, and be easier to understand, by: startTime - as.integer(as.difftime(13:23:00,units=secs)); # start on or after 1:23 p.m. endTime - as.integer(as.difftime(15:10:01,units=secs)); # end on or before 3:10:01 p.m. testTime - as.integer(myData$when)%%86400; #convert to seconds and eliminate date portion. subsetMyData -myData[testTime = startTime testTime = endTime,]; This will work best if myData$when is in GMT instead of local time. Why? No DST worries. Again, in my opinion, all time date should be recorded in GMT. Only convert to local time when displaying the data to an ignorant user who can't handle GMT. Personally, I love to tell people something like: it is 13:59:30 zulu. In my time zone, today, that is 08:59:30 a.m. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
Marc: You just need to be more patient -- this is already happening: http://en.wikipedia.org/wiki/Tidal_acceleration Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Tue, Aug 12, 2014 at 1:10 PM, Marc Schwartz marc_schwa...@me.com wrote: On Aug 12, 2014, at 2:49 PM, John McKown john.archie.mck...@gmail.com wrote: And some people wonder why I absolutely abhor daylight saving time. I'm not really fond of leap years and leap seconds either. Somebody needs to fix the Earth's rotation and orbit! I have been a longtime proponent of slowing the rotation of the Earth on its axis, so that we could have longer days to be more productive. Unfortunately, so far, my wish has gone unfulfilled...at least as it is relevant within human lifetimes. ;-) Regards, Marc On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote: What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? Why do you want such a thing? E.g., do you want it to print the time of day without the date? Or are you trying to avoid numeric problems when you do regressions with the seconds-since-1970 numbers around 1414918800? Or is there another problem you want solved? Note that the number of seconds in a day depends on the day and the time zone. In US/Pacific time I get: length(seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs)) [1] 86400 length(seq(from=as.POSIXct(2014-03-09 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs)) [1] 82800 length(seq(from=as.POSIXct(2014-11-02 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs)) [1] 9 Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Hello! If I would like to generate a sequence of seconds for a date, I would do the following: x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12 23:59:59),by=secs) What if I just want the seconds vector without the date, please? Is there a convenient way to create such a vector, please? thanks, Erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
Again, in my opinion, all time date should be recorded in GMT. It depends on context. If you are studying traffic flow or electricity usage, then you want local time with all its warts (perhaps stated as time since 3am so any daylight savings time problems are confined to a small portion of the data), perhaps along with time since sunrise and time since sunset. If you are studying astronomy, then UTC is appropropriate. Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Aug 12, 2014 at 1:16 PM, John McKown john.archie.mck...@gmail.com wrote: On Tue, Aug 12, 2014 at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote: What I would like to do is to look at several days and determine activities that happened at times on those days. I don't really care which days, I just care about what time. Thank you! Ah! A light dawns. You want to subset your data based on some part of the time. Such as between 13:23:00 and 15:10:01 of each day in the sample. Ignoring the DST issue, which I shouldn't. It is left as an exercise for the reader. But usually 13:23 is 13*3600+23*60, 48180, seconds after midnight. 15:10:01 is 15*3600+10*60+1, 54601, seconds after midnight. Suppose you have a data.frame() in a variable called myData. Further suppose that the POSIXct variable in this data.frame is called when. You want to subset this into another data.frame() and call it subsetMyData. subsetMyData-myData[as.integer(myData$when)%%86400 = 48180 as.integer(myData$when)%%86400 = 54601,]; Yes, this is ugly. You might make it look nicer, and be easier to understand, by: startTime - as.integer(as.difftime(13:23:00,units=secs)); # start on or after 1:23 p.m. endTime - as.integer(as.difftime(15:10:01,units=secs)); # end on or before 3:10:01 p.m. testTime - as.integer(myData$when)%%86400; #convert to seconds and eliminate date portion. subsetMyData -myData[testTime = startTime testTime = endTime,]; This will work best if myData$when is in GMT instead of local time. Why? No DST worries. Again, in my opinion, all time date should be recorded in GMT. Only convert to local time when displaying the data to an ignorant user who can't handle GMT. Personally, I love to tell people something like: it is 13:59:30 zulu. In my time zone, today, that is 08:59:30 a.m. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
On Tue, Aug 12, 2014 at 3:23 PM, William Dunlap wdun...@tibco.com wrote: Again, in my opinion, all time date should be recorded in GMT. It depends on context. If you are studying traffic flow or electricity usage, then you want local time with all its warts (perhaps stated as time since 3am so any daylight savings time problems are confined to a small portion of the data), perhaps along with time since sunrise and time since sunset. I see your point. But if my data is in GMT, that is a unique timestamp value. And, given that, along with location information, I should then be able to generate a local time for human activity. E.g. when do people go to lunch? Another plus of this is that there is no confusion during fall back whether this is the 1st or 2nd instance of something like 02:27:00. Long ago, I worked for a city government. The recorded everything on the machine in local time. Including police log entries. Always made me wonder why some lawyer didn't have a nice window of confusion if something allegedly happened on time change day and was logged as 02:30:00. If you are studying astronomy, then UTC is appropropriate. Bill Dunlap TIBCO Software wdunlap tibco.com -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pass vector binding to DBI parameter (rsqlite)
On Tue, Aug 12, 2014 at 2:46 PM, Dan Muresan danm...@gmail.com wrote: Yes, of course, that's an obvious work-around, thanks. Another one is to use temporary tables. But I'd like to know if binding a vector to an SQL parameter is possible in rsqlite (or even in the DBI API or with other drivers -- it seems to me it isn't). This seems like a nasty shortcoming (especially in light of SQL injection, but there are other considerations). That type of binding seems to be something that was overlooked when the API was being designed. Or, as some vendor might say: we considered that, but decided to reject it due to the difficulty of implementation and lack of need in the vast majority of cases. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A basic statistics question
Hi, I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data. Data - structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L, 0L, 10L, 10L, 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L, 0L, 13L, 13L, 10L, 7L, 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L, 10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, 7L, 7L, 8L, 7L, 8L, 6L, 6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, 12L, 8L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L, 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 4L, 4L, 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L, 5L, 4L, 6L, 6L, 6L, 7L, 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L, 6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, 4L, 6L, 6L, 6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L, 9L, 10L, 8L, 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L, 13L, 10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L, 3L, 7L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L, 10L, 8L, 8L, 9L, 9L, 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 4L, 6L, 4L, 5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, 7L, 3L, 3L, 10L, 13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, 3L, 0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L, 2L, 2L, 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L, 4L, 4L, 8L, 7L, 10L, 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L, 8L), .Dimnames = list(NULL, c(V1, V7, V13, V19, V25, V31, V37, V43))) Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x))) (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] Point is that I am not getting exact CORR matrix. Can somebody point me what I am missing here? Thanks for your pointer. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating a sequence of seconds
Great! Thank you! I think the function with the C-like function should do the trick. On Tue, Aug 12, 2014 at 4:31 PM, John McKown john.archie.mck...@gmail.com wrote: On Tue, Aug 12, 2014 at 3:23 PM, William Dunlap wdun...@tibco.com wrote: Again, in my opinion, all time date should be recorded in GMT. It depends on context. If you are studying traffic flow or electricity usage, then you want local time with all its warts (perhaps stated as time since 3am so any daylight savings time problems are confined to a small portion of the data), perhaps along with time since sunrise and time since sunset. I see your point. But if my data is in GMT, that is a unique timestamp value. And, given that, along with location information, I should then be able to generate a local time for human activity. E.g. when do people go to lunch? Another plus of this is that there is no confusion during fall back whether this is the 1st or 2nd instance of something like 02:27:00. Long ago, I worked for a city government. The recorded everything on the machine in local time. Including police log entries. Always made me wonder why some lawyer didn't have a nice window of confusion if something allegedly happened on time change day and was logged as 02:30:00. If you are studying astronomy, then UTC is appropropriate. Bill Dunlap TIBCO Software wdunlap tibco.com -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown -- Erin Hodgess Associate Professor Department of Mathematical and Statistics University of Houston - Downtown mailto: erinm.hodg...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A basic statistics question
On 12-Aug-2014 19:57:29 Ron Michael wrote: Hi, I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data. Data - structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L, 0L, 10L, 10L, 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L, 0L, 13L, 13L, 10L, 7L, 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L, 10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, 7L, 7L, 8L, 7L, 8L, 6L, 6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, 12L, 8L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L, 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L, 6L, 4L, 4L, 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L, 5L, 4L, 6L, 6L, 6L, 7L, 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L, 6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, 4L, 6L, 6L, 6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L, 9L, 10L, 8L, 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L, 13L, 10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L, 3L, 7L, 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L, 10L, 8L, 8L, 9L, 9L, 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L, 2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 4L, 6L, 4L, 5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, 7L, 3L, 3L, 10L, 13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, 3L, 0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L, 2L, 2L, 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L, 4L, 4L, 8L, 7L, 10L, 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L, 8L), .Dimnames = list(NULL, c(V1, V7, V13, V19, V25, V31, V37, V43))) Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x))) (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] Point is that I am not getting exact CORR matrix. Can somebody point me what I am missing here? Thanks for your pointer. Try: Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x))) (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) and compare the result with cor(Data) And why? Look at ?sd and note that: Details: Like 'var' this uses denominator n - 1. Hoping this helps, Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 12-Aug-2014 Time: 22:32:26 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A basic statistics question
On 13/08/14 07:57, Ron Michael wrote: Hi, I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data. SNIP (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] Point is that I am not getting exact CORR matrix. Can somebody point me what I am missing here? You are using a denominator of n in calculating your covariance matrix for your normalized data. But these data were normalized using the sd() function which (correctly) uses a denominator of n-1 so as to obtain an unbiased estimator of the population standard deviation. If you calculated (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) then you would get the same result as you get from cor(Data) (to within about 1e-15). cheers, Rolf Turner -- Rolf Turner Technical Editor ANZJS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A basic statistics question
On 12-Aug-2014 21:41:52 Rolf Turner wrote: On 13/08/14 07:57, Ron Michael wrote: Hi, I would need to get a clarification on a quite fundamental statistics property, hope expeRts here would not mind if I post that here. I leant that variance-covariance matrix of the standardized data is equal to the correlation matrix for the unstandardized data. So I used following data. SNIP (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1] Point is that I am not getting exact CORR matrix. Can somebody point me what I am missing here? You are using a denominator of n in calculating your covariance matrix for your normalized data. But these data were normalized using the sd() function which (correctly) uses a denominator of n-1 so as to obtain an unbiased estimator of the population standard deviation. If you calculated (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1) then you would get the same result as you get from cor(Data) (to within about 1e-15). cheers, Rolf Turner One could argue about (correctly)! From the descriptive statistics point of view, if one is given a single number x, then this dataset has no variation, so one could say that sd(x) = 0. And this is what one would get with a denominator of n. But if the single value x is viewed as sampled from a distribution (with positive dispersion), then the value of x gives no information about the SD of the distribution. If you use denominator (n-1) then sd(x) = NA, i.e. is indeterminate (as it should be in this application). The important thing when using pre-programmed functions is to know which is being used. R uses (n-1), and this can be found from looking at ?sd or (with more detail) at ?cor Ron had assumed that the denominator was n, apparently not being aware that R uses (n-1). Just a few thoughts ... Ted. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 12-Aug-2014 Time: 23:22:09 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pass vector binding to DBI parameter (rsqlite)
I am not quite sure what you are complaining about. The ODBC interface definition is not vectorized, and that has nothing to do with R... that applies across all platforms I have seen. The DBI API is consistent with that. There are some proprietary APIs that implement bulk data transfers, but then you are stuck with that API. It might be appropriate to discuss this on R-sig-db if you have better information than I do. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On August 12, 2014 12:46:30 PM PDT, Dan Muresan danm...@gmail.com wrote: Yes, of course, that's an obvious work-around, thanks. Another one is to use temporary tables. But I'd like to know if binding a vector to an SQL parameter is possible in rsqlite (or even in the DBI API or with other drivers -- it seems to me it isn't). This seems like a nasty shortcoming (especially in light of SQL injection, but there are other considerations). On 8/12/14, John McKown john.archie.mck...@gmail.com wrote: On Tue, Aug 12, 2014 at 10:55 AM, Dan Muresan danm...@gmail.com wrote: Hi, is there a way to bind vectors to DBI query parameters? The following tells me that vectors are sent as separate values: library(RSQLite) c - dbConnect (SQLite()) dbGetQuery(c, create table tst (x int, y int)) dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2), y=c(3, 4, 5, 6))) dbReadTable(c, tst) x y 1 1 3 2 2 4 3 1 5 4 2 6 dbGetQuery(c, select * from tst where y not in (?), c(7,6)) x y 1 1 3 2 2 4 3 1 5 4 2 6 5 1 3 6 2 4 7 1 5 This looks like 2 result sets (4 + 3 entries), not one. Is there to send multiple values to a '?' binding? Is this at all possible using the R DBI interface (not necessarily with rsqlite)? I don't really _know_ much, but what I would try would be something like: dbGetQuery(c,select * from tst where y not in (?),paste(c(7,6),collapse=',')); The paste(c(7,6),collapse=',') results in the string 6,7. You could always subject yourself to a SQL injection attack by doing: dbGetQuery(c,paste(select * from tst where y not in (,c(7,6),),collapse=',')); If you do this and use a variable instead of the c(7,6), make sure you cleanse the contents of the variable. Just as making sure that there is no bare semi-colon in it. And other things that don't come to mind off hand. Hum, perhaps better: values-c(7,6); dbGetQuery(c,paste(select * from tst where y not in (, paste(rep('?',length(values)),collapse=','), )), values); As you can see, this dynamically adjusts the number of ? marks in the SELECT statement, based on the number of elements in the values variable. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! John McKown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] populating matrix with binary variable after matching data from data frame
Hi: sorry I have a basic question. I have a data frame with two columns: x1 V1 V2 1 AKT3TCL1A 2 AKTIPVPS41 3 AKTIPPDPK1 4 AKTIP GTF3C1 5 AKTIPHOOK2 6 AKTIPPOLA2 7 AKTIP KIAA1377 8 AKTIP FAM160A2 9 AKTIPVPS16 10 AKTIPVPS18 I have a matrix 1211x1211 (using some elements in x1$V1 and some from x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1 whereas AKT3 - VPS41 gets 0) How can i map this binary relations in x. x TCLA1 VPS41 ABCA13 ABCA4 AKT3 0 0 0 0 AKTIP 0 0 0 0 ABCA13 0 0 0 0 ABCA4 0 0 0 0 dput - x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L, 4L), .Dimnames = list(c(AKT3, AKTIP, ABCA13, ABCA4 ), c(TCLA1, VPS41, ABCA13, ABCA4))) x1 = structure(list(V1 = c(AKT3, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP), V2 = c(TCL1A, VPS41, PDPK1, GTF3C1, HOOK2, POLA2, KIAA1377, FAM160A2, VPS16, VPS18)), .Names = c(V1, V2), row.names = c(NA, 10L), class = data.frame) Thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] populating matrix with binary variable after matching data from data frame
You could try: x1$V2[1] - TCLA1 x[outer(rownames(x), colnames(x), FUN=paste) %in% as.character(interaction(x1, sep= ))] - 1 x TCLA1 VPS41 ABCA13 ABCA4 AKT3 1 0 0 0 AKTIP 0 1 0 0 ABCA13 0 0 0 0 ABCA4 0 0 0 0 A.K. On Tuesday, August 12, 2014 8:16 PM, Adrian Johnson oriolebaltim...@gmail.com wrote: Hi: sorry I have a basic question. I have a data frame with two columns: x1 V1 V2 1 AKT3 TCL1A 2 AKTIP VPS41 3 AKTIP PDPK1 4 AKTIP GTF3C1 5 AKTIP HOOK2 6 AKTIP POLA2 7 AKTIP KIAA1377 8 AKTIP FAM160A2 9 AKTIP VPS16 10 AKTIP VPS18 I have a matrix 1211x1211 (using some elements in x1$V1 and some from x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1 whereas AKT3 - VPS41 gets 0) How can i map this binary relations in x. x TCLA1 VPS41 ABCA13 ABCA4 AKT3 0 0 0 0 AKTIP 0 0 0 0 ABCA13 0 0 0 0 ABCA4 0 0 0 0 dput - x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(4L, 4L), .Dimnames = list(c(AKT3, AKTIP, ABCA13, ABCA4 ), c(TCLA1, VPS41, ABCA13, ABCA4))) x1 = structure(list(V1 = c(AKT3, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP, AKTIP), V2 = c(TCL1A, VPS41, PDPK1, GTF3C1, HOOK2, POLA2, KIAA1377, FAM160A2, VPS16, VPS18)), .Names = c(V1, V2), row.names = c(NA, 10L), class = data.frame) Thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox regression model for matched data with replacement
I am curious about this problem as well. How do you go about creating the weights for each pair, and are you suggesting that we can just incorporate a weight statement in the model as opposed to the strata statement? And Dr. Therneau, let's say I have 140 cases matched with replacement to 2 controls. Is my id variable the number of cases? Thanks, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.