Re: [R] mixtools? Fitting two-normal distributions to data where one of the two normal distributions (the one corresponding to lower values of x) is a left-truncated normal distribution.
Hi John, I don't know how well it will handle your truncated left distribution, but I use the function Mclust from package mclust to fit a mixture of normal distribution and it works very well. Denis Le 2015-06-30 à 22:22, John Sorkin jsor...@grecc.umaryland.edu a écrit : I am trying to model the mixture of two normal distributions, where x values are in the range of zero to some positive value. I know about mixtools and would use it save for the fact that the the y values from the normal distribution corresponding to the lower values of x (i.e. from zero to x/n) are from what appears to be a left-truncated normal distribution (i.e. the y values are all from the upper half of a normal distribution). The y values from higher values of x (i.e. from x/n to x) all appear to come from a normal distribution. Can someone suggest how to fit two normal distributions where one of the two distributions is left-truncated? Can this be done using mixtools? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for ...{{dropped:12}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeated measures: multiple comparisons with pairwise.t.test and multcomp disagree
Thank you, Thierry. And yes, Bert, it turns out that it is more of a statistical question after all, but again, since my question used specific R functions, R experts are well placed to help me. As pairewise.t.test was recommended in a few tutorials about repeated-measure Anovas, I assumed it took into account the fact that the measures were indeed repeated, so thank you for pointing out that it does not. But my reason for not accepting the result of multcomp went further than this. Before deciding to test 4 different durations, I had tested only two of them, corresponding to sets 1 and 2 of my example. I used a paired t test (as in t test for paired samples). I had a very significant effect, i.e. the mean of the differences calculated for each subject was significantly different from zero. After adding two other durations and switching from my paired t test to a repeated measures design, these same 2 sets are no longer different. I think the explanation is lack of homogeneity of variances. I thought a log transformation of the raw data had been sufficient to fix this, and a Levene test on the variances of the 4 sets found no problem in this regard. But maybe it is the variance of all the possible differences (set 1 vs 2, etc, for a total of 6 differences calculated for each subject) that matters. I just calculated these and they range from 1.788502e-05 to 1.462171e-03. A Levene test on these 6 groups showed that their variances were heterogeneous. I think I'll stay away from the repeated measures followed by multiple comparisons and just report my 6 t tests for paired samples, correcting the p-level for the number of comparisons with, say, the Sidak method (p for significance is then 0.0085). Thanks for your help. Denis Le 2015-06-23 à 08:15, Thierry Onkelinx thierry.onkel...@inbo.be a écrit : Dear Denis, It's not multcomp which is too conservative, it is the pairwise t-test which is too liberal. The pairwise t-test doesn't take the random effect of Case into account. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-06-23 5:17 GMT+02:00 Denis Chabot denis.cha...@me.com: Hi, I am working on a problem which I think can be handled as a repeated measures analysis, and I have read many tutorials about how to do this with R. This part goes well, but I get stuck with the multiple comparisons I'd like to run afterward. I tried two methods that I have seen in my readings, but their results are quite different and I don't know which one to trust. The two approaches are pairwise.t.test() and multcomp, although the latter is not available after a repeated-measures aov model, but it is after a lme. I have a physiological variable measured frequently on each of 67 animals. These are then summarized with a quantile for each animal. To check the effect of experiment duration, I recalculated the quantile for each animal 4 times, using different subset of the data (so the shortest subset is part of all other subsets, the second subset is included in the 2 others, etc.). I handle this as 4 repeated (non-independent) measurements for each animal, and want to see if the average value (for 67 animals) differs for the 4 different durations. Because animals with high values for this physiological trait have larger differences between the 4 durations than animals with low values, the observations were log transformed. I attach the small data set (Rda format) here, but it can be obtained here if the attachment gets stripped: https://dl.dropboxusercontent.com/u/612902/RepMeasData.Rda The data.frame is simply called Data. My code is load(RepMeasData.Rda) Data_Long = melt(Data, id=Case) names(Data_Long) = c(Case,Duration, SMR) Data_Long$SMR = log10(Data_Long$SMR) # I only show essential code to reproduce my opposing results mixmod = lme(SMR ~ Duration, data = Data_Long, random = ~ 1 | Case) anova(mixmod) posthoc - glht(mixmod, linfct = mcp(Duration = Tukey)) summary(posthoc) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lme.formula(fixed = SMR ~ Duration, data = Data_Long, random = ~1 | Case) Linear Hypotheses: Estimate Std. Error z value Pr(|z|) Set2 - Set1 == 0 -0.006135 0.003375 -1.8180.265 Set3 - Set1 == 0 -0.002871
[R] repeated measures: multiple comparisons with pairwise.t.test and multcomp disagree
Hi, I am working on a problem which I think can be handled as a repeated measures analysis, and I have read many tutorials about how to do this with R. This part goes well, but I get stuck with the multiple comparisons I'd like to run afterward. I tried two methods that I have seen in my readings, but their results are quite different and I don't know which one to trust. The two approaches are pairwise.t.test() and multcomp, although the latter is not available after a repeated-measures aov model, but it is after a lme. I have a physiological variable measured frequently on each of 67 animals. These are then summarized with a quantile for each animal. To check the effect of experiment duration, I recalculated the quantile for each animal 4 times, using different subset of the data (so the shortest subset is part of all other subsets, the second subset is included in the 2 others, etc.). I handle this as 4 repeated (non-independent) measurements for each animal, and want to see if the average value (for 67 animals) differs for the 4 different durations. Because animals with high values for this physiological trait have larger differences between the 4 durations than animals with low values, the observations were log transformed. I attach the small data set (Rda format) here, but it can be obtained here if the attachment gets stripped: https://dl.dropboxusercontent.com/u/612902/RepMeasData.Rda The data.frame is simply called Data. My code is load(RepMeasData.Rda) Data_Long = melt(Data, id=Case) names(Data_Long) = c(Case,Duration, SMR) Data_Long$SMR = log10(Data_Long$SMR) # I only show essential code to reproduce my opposing results mixmod = lme(SMR ~ Duration, data = Data_Long, random = ~ 1 | Case) anova(mixmod) posthoc - glht(mixmod, linfct = mcp(Duration = Tukey)) summary(posthoc) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lme.formula(fixed = SMR ~ Duration, data = Data_Long, random = ~1 | Case) Linear Hypotheses: Estimate Std. Error z value Pr(|z|) Set2 - Set1 == 0 -0.006135 0.003375 -1.8180.265 Set3 - Set1 == 0 -0.002871 0.003375 -0.8510.830 Set4 - Set1 == 0 0.015395 0.003375 4.561 1e-04 *** Set3 - Set2 == 0 0.003264 0.003375 0.9670.768 Set4 - Set2 == 0 0.021530 0.003375 6.379 1e-04 *** Set4 - Set3 == 0 0.018266 0.003375 5.412 1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method) with(Data_Long, pairwise.t.test(SMR, Duration, p.adjust.method=holm, paired=T)) Pairwise comparisons using paired t tests data: SMR and Duration Set1Set2Set3 Set2 2e-16 - - Set3 0.8 0.10648 - Set4 0.00475 7.9e-05 0.00034 P value adjustment method: holm So the difference between sets 1 and 2 goes from non significant to very significant, depending on method. I have other examples with essentially the same type of data and sometimes the two approches differ in the opposing way. In the example shown here, multcomp was more conservative, in some others it yielded a larger number of significant differences. I admit not mastering all the intricacies of multcomp, but I have used multcomp and other methods of doing multiple comparisons many times before (but never with a repeated measures design), and always found the results very similar. When there were small differences, I trusted multcomp. This time, I get rather large differences and I am worried that I am doing something wrong. Thanks in advance, Denis Chabot Fisheries Oceans Canada sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] multcomp_1.4-0 TH.data_1.0-6 survival_2.38-1 mvtnorm_1.0-2 nlme_3.1-120car_2.0-25 reshape2_1.4.1 loaded via a namespace (and not attached): [1] Rcpp_0.11.5 magrittr_1.5 splines_3.2.0MASS_7.3-40 lattice_0.20-31 minqa_1.2.4 stringr_1.0.0 [8] plyr_1.8.2 tools_3.2.0 nnet_7.3-9 pbkrtest_0.4-2 parallel_3.2.0 grid_3.2.0 mgcv_1.8-6 [15] quantreg_5.11lme4_1.1-7 Matrix_1.2-0 nloptr_1.0.4 codetools_0.2-11 sandwich_2.3-3 stringi_0.4-1 [22] SparseM_1.6 zoo_1.7-12 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] puzzled by time zone quirk
Hi, I have to deal with time-stamped data coming from outside my own time zone, so the problem is likely poor knowledge of European time zones on my part. But I am puzzled just the same. I thought that setting a time zone of Europe/Copenhagen would be the same as CET in winter and CEST in summer. This test in winter works as expected: a = as.POSIXct(2013-02-25 01:00:00, tz=Europe/Copenhagen); a [1] 2013-02-25 01:00:00 CET b = as.POSIXct(2013-02-25 01:00:00, tz=CET); b [1] 2013-02-25 01:00:00 CET a-b Time difference of 0 secs But this one is summer does not work as I expected: c = as.POSIXct(2013-07-25 01:00:00, tz=Europe/Copenhagen); c [1] 2013-07-25 01:00:00 CEST d = as.POSIXct(2013-07-25 01:00:00, tz=CEST); d [1] 2013-07-25 01:00:00 UTC e = as.POSIXct(2013-07-25 01:00:00, tz=CET); e [1] 2013-07-25 01:00:00 CEST c-d Time difference of -2 hours c-e Time difference of 0 secs Setting tz to Europe/Copenhagen in summer in c first appears to be the same as setting it to CEST because the output is showing CEST. But d should then be the same as c, and it is not. What is happening? Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled by time zone quirk
Sorry, I had not posted in a long time and I remembered this as I pushed the send button. And I am not surprised that I thought wrong! I'll start with the missing information: sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2 Then I'll admit that some of the very useful details you provided had escaped me, but in my defense, I took to heart one element found in ?Sys.timezone: It is not in general possible to retrieve the system's own name(s) for the current timezone, but Sys.timezone will retrieve the name it uses for the current time (and the name may differ depending on whether daylight saving time is in effect). When I tell my computer that I am in Europe, I get Sys.time() [1] 2014-09-21 16:38:45 CEST As the output of my c also displayed CEST, I assumed this was the preferred way to refer to that time zone. Because of this, I had expected c and d to be the same. The output of c is deceiving. But at least I now know not to use CEST. Denis Le 2014-09-21 à 10:00, Prof Brian Ripley rip...@stats.ox.ac.uk a écrit : On 21/09/2014 14:11, Denis Chabot wrote: Hi, I have to deal with time-stamped data coming from outside my own time zone, so the problem is likely poor knowledge of European time zones on my part. But I am puzzled just the same. I thought that setting a time zone of Europe/Copenhagen would be the same as CET in winter and CEST in summer. You thought wrong: CEST is not a valid timezone on most (maybe all) R platforms. You failed to tell us the 'at a minimum' information required by the posting guide. ?Sys.timezone says OlsonNames() tells you the timezone names supported on your unstated platform, and ?as.POSIXct says tz: A time zone specification to be used for the conversion, _if one is required_. System-specific (see time zones), but ‘’ is the current time zone, and ‘GMT’ is UTC (Universal Time, Coordinated). Invalid values are most commonly treated as UTC, on some platforms with a warning. As the posting guide asks, please do your own homework. This test in winter works as expected: a = as.POSIXct(2013-02-25 01:00:00, tz=Europe/Copenhagen); a [1] 2013-02-25 01:00:00 CET b = as.POSIXct(2013-02-25 01:00:00, tz=CET); b [1] 2013-02-25 01:00:00 CET a-b Time difference of 0 secs But this one is summer does not work as I expected: c = as.POSIXct(2013-07-25 01:00:00, tz=Europe/Copenhagen); c [1] 2013-07-25 01:00:00 CEST d = as.POSIXct(2013-07-25 01:00:00, tz=CEST); d [1] 2013-07-25 01:00:00 UTC e = as.POSIXct(2013-07-25 01:00:00, tz=CET); e [1] 2013-07-25 01:00:00 CEST c-d Time difference of -2 hours c-e Time difference of 0 secs Setting tz to Europe/Copenhagen in summer in c first appears to be the same as setting it to CEST because the output is showing CEST. But d should then be the same as c, and it is not. What is happening? Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled by time zone quirk
Hi again, With the new installation: R version 3.1.1 (2014-07-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.1 I do get a warning that CEST is not a valid time zone, but c is still displayed with CEST as time zone, which remains confusing. c = as.POSIXct(2013-07-25 01:00:00, tz=Europe/Copenhagen); c [1] 2013-07-25 01:00:00 CEST d = as.POSIXct(2013-07-25 01:00:00, tz=CEST); d Messages d'avis : 1: In strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : unknown timezone 'CEST' 2: In as.POSIXct.POSIXlt(x) : unknown timezone 'CEST' 3: In strptime(x, f, tz = tz) : unknown timezone 'CEST' 4: In as.POSIXct.POSIXlt(as.POSIXlt(x, tz, ...), tz, ...) : unknown timezone 'CEST' [1] 2013-07-25 01:00:00 GMT Message d'avis : In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'CEST' It is fine now that I am warned, but I wish CEST did not appear at all. Denis Le 2014-09-21 à 10:44, Prof Brian Ripley rip...@stats.ox.ac.uk a écrit : You neglected to update before posting as required by the posting guide. R 3.0.2 is far from current, and on OS X the timezone internals were replaced in R 3.1.x (the previous version did not handle 64-bit time_t correctly, even though that is what OS X uses). And the documentation is different. ... -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting and Dates
Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355NA 2013-03-22 15786 NANA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383NA 2013-04-19 15814 NANA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subsetting and Dates
Thank you for the 2 methods to make the columns class Date, but I would really like to know why these variables were not in Date class with my code. Do you know? Denis Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit : You could convert those columns to Date class by: Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01) #or Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date)) # dat1 dat2 Dat1a Dat1b Dat2a Dat2b #1 41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22 #2 41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01 #3 41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08 #4 41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15 #5 41355NA 2013-03-22 2013-03-22 NA NA #6 41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29 #7 41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05 #8 41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12 #9 41383NA 2013-04-19 2013-04-19 NA NA #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03 A.K. - Original Message - From: Denis Chabot chabot.de...@gmail.com To: R-help@r-project.org Cc: Sent: Thursday, May 23, 2013 5:35 PM Subject: [R] subsetting and Dates Hi, I am trying to understand why creating Date variables does not work if I subset to avoid NAs. I had problems creating these Date variables in my code and I thought that the presence of NAs was the cause. So I used a condition to avoid NAs. It turns out that NAs are not a problem and I do not need to subset, but I'd like to understand why subsetting causes the problem. The strange numbers I start with are what I get when I read an Excel sheet with the function read.xls() from package gdata. dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 41397) dat2 = dat1 dat2[c(5,9)]=NA Data = data.frame(dat1,dat2) keep1 = !is.na(Data$dat1) keep2 = !is.na(Data$dat2) Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) Data dat1 dat2 Dat1a Dat1b Dat2a Dat2b 1 41327 41327 2013-02-22 15758 2013-02-22 15758 2 41334 41334 2013-03-01 15765 2013-03-01 15765 3 41341 41341 2013-03-08 15772 2013-03-08 15772 4 41348 41348 2013-03-15 15779 2013-03-15 15779 5 41355NA 2013-03-22 15786 NANA 6 41362 41362 2013-03-29 15793 2013-03-29 15793 7 41369 41369 2013-04-05 15800 2013-04-05 15800 8 41376 41376 2013-04-12 15807 2013-04-12 15807 9 41383NA 2013-04-19 15814 NANA 10 41390 41390 2013-04-26 15821 2013-04-26 15821 11 41397 41397 2013-05-03 15828 2013-05-03 15828 So variables Dat1b and Dat2b are not converted to Date class. sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] gdata_2.12.0 loaded via a namespace (and not attached): [1] gtools_2.7.0 Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] puzzling Date math result
Hi, I cannot make a reproducible example easily for my problem, so I'll describe it as best as I can. I merged 2 dataframes but was surprised when one line on the x dataframe did not get a match in the y dataframe, because I knew such a match existed. There was only one by variable in the merge, in Date format: in x: $ période: Date, format: 2009-06-09 2009-07-09 ... in y: $ date : Date, format: 2009-05-12 2009-06-09 … I extracted the date that did not match into variables a (from x) and b (from y): a=test1$période[21] b=test2$date[22] a [1] 2011-04-06 b [1] 2011-04-06 and then this very puzzling situation: a==b [1] FALSE as.integer(a) [1] 15070 as.integer(b) [1] 15070 as.integer(a)==as.integer(b) [1] TRUE Thanks in advance for an explanation or a suggestion to further study this puzzle, Denis sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/en_US.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] doBy_4.5.2 MASS_7.3-17 snow_0.3-8 lme4_0.999375-42 [5] Matrix_1.0-6 lattice_0.20-6 multcomp_1.2-12 mvtnorm_0.9-9992 [9] R2HTML_2.2 survival_2.36-12 gdata_2.8.2 loaded via a namespace (and not attached): [1] grid_2.15.0 gtools_2.6.2 nlme_3.1-103 stats4_2.15.0 tools_2.15.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strange convention for time zone names
Hi, My time zone in Montreal is Standard time zone:UTC/GMT -5 hours (see http://www.timeanddate.com/worldclock/city.html?n=165). Yet, in R (POSIXct objects) I must specify the opposite, i.e. UTC+5: dateMontreal = as.POSIXct(2011-01-15 05:00:00, tz=EST) dateMontreal2 = as.POSIXct(2011-01-15 05:00:00, tz=UTC+5) wrongdateMontreal = as.POSIXct(2011-01-15 05:00:00, tz=UTC-5) dateLondon = as.POSIXct(2011-01-15 10:00:00, tz=UTC0) difftime(dateMontreal, dateLondon) Time difference of 0 secs difftime(dateMontreal2, dateLondon) Time difference of 0 secs difftime(wrongdateMontreal, dateLondon) Time difference of -10 hours Is there a reason for this counter-intuitive convention? Denis R version 2.13.1 (2011-07-08) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ifelse strips POSIXct class from object
Hi, I was losing my dates in a script and upon inspection, found that my recent switch from separate if and else to ifelse was the cause. But why? my.date = as.POSIXct(2011-06-04 08:00:00) default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 08:00:00), length=15) x = 4 * 60 * 60 (my.date + x) (min(default.date) + x) (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x) ) (if(!is.na(my.date)) new.date2 = my.date + x else new.date2= min(default.date) + x ) On my machine, new.date is numeric whereas new.date2 is POSIXct and POSIXt, as desired. sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse strips POSIXct class from object
Thanks Duncan, I'll go back to if and else! Denis Le 2011-06-05 à 08:39, Duncan Murdoch a écrit : On 11-06-05 8:23 AM, Denis Chabot wrote: Hi, I was losing my dates in a script and upon inspection, found that my recent switch from separate if and else to ifelse was the cause. But why? See ?ifelse. The class of the result is the same as the class of the test, not the classes of the alternatives. You need to manually attach the class again, or use a different construction. Duncan Murdoch my.date = as.POSIXct(2011-06-04 08:00:00) default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 08:00:00), length=15) x = 4 * 60 * 60 (my.date + x) (min(default.date) + x) (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x) ) (if(!is.na(my.date)) new.date2 = my.date + x else new.date2= min(default.date) + x ) On my machine, new.date is numeric whereas new.date2 is POSIXct and POSIXt, as desired. sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse strips POSIXct class from object
I did not know this function, thanks a lot Gabor. Denis Le 2011-06-05 à 08:48, Gabor Grothendieck a écrit : On Sun, Jun 5, 2011 at 8:23 AM, Denis Chabot chabot.de...@gmail.com wrote: Hi, I was losing my dates in a script and upon inspection, found that my recent switch from separate if and else to ifelse was the cause. But why? my.date = as.POSIXct(2011-06-04 08:00:00) default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 08:00:00), length=15) x = 4 * 60 * 60 (my.date + x) (min(default.date) + x) (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x) ) Try replace: new.date - replace(my.date, is.na(my.date), min(default.date)) + x -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse strips POSIXct class from object
Hi Duncan, In this case they all had length 1, but I'll be careful at other occasions. Denis Le 2011-06-05 à 09:26, Duncan Murdoch a écrit : On 11-06-05 8:49 AM, Denis Chabot wrote: Thanks Duncan, I'll go back to if and else! Be careful, it might not give you the same answer. I'd use this variation on the advice from ?ifelse: new.date - my.date + x new.date[is.na(my.date)] - min(default.date) + x The thing to watch out for in this construction is that the lengths of the vectors come out right. I'm assuming that my.date + x is the same length as is.na(my.date)], and that min(default.date) + x is length 1, but I haven't tried your code to check. Duncan Murdoch Denis Le 2011-06-05 à 08:39, Duncan Murdoch a écrit : On 11-06-05 8:23 AM, Denis Chabot wrote: Hi, I was losing my dates in a script and upon inspection, found that my recent switch from separate if and else to ifelse was the cause. But why? See ?ifelse. The class of the result is the same as the class of the test, not the classes of the alternatives. You need to manually attach the class again, or use a different construction. Duncan Murdoch my.date= as.POSIXct(2011-06-04 08:00:00) default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 08:00:00), length=15) x = 4 * 60 * 60 (my.date + x) (min(default.date) + x) (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x) ) (if(!is.na(my.date)) new.date2 = my.date + x else new.date2= min(default.date) + x) On my machine, new.date is numeric whereas new.date2 is POSIXct and POSIXt, as desired. sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unwanted switch to DST with POSIXct objects
Hi, For a project I try to keep everything in normal time, not daylight saving time, to prevent problem when instruments collected data during the nights when we go from DST to normal time. But sometimes R tricks me and I do not know how to prevent it. This is one example: lights_on = as.POSIXct(c(2011-05-06 04:09:26, 2011-05-07 04:07:53, 2011-05-08 04:06:21, 2011-05-09 04:04:51, 2011-05-10 04:03:22, 2011-05-11 04:01:55, 2011-05-12 04:00:30, 2011-05-13 03:59:06, 2011-05-14 03:57:45, 2011-05-15 03:56:25, 2011-05-16 03:55:07), tz=EST) # not DST lights_off = as.POSIXct(c(2011-05-05 18:56:54, 2011-05-06 18:58:19, 2011-05-07 18:59:44, 2011-05-08 19:01:08, 2011-05-09 19:02:32, 2011-05-10 19:03:55, 2011-05-11 19:05:18, 2011-05-12 19:06:40, 2011-05-13 19:08:01, 2011-05-14 19:09:22, 2011-05-15 19:10:42 ), tz=EST) # not DST (a = lights_on[c(1,5)]) # not DST [1] 2011-05-06 04:09:26 EST 2011-05-10 04:03:22 EST (b = lights_off[c(2,6)])# not DST [1] 2011-05-06 18:58:19 EST 2011-05-10 19:03:55 EST (x = c(lights_off[2], lights_on[2])) # suddenly DST [1] 2011-05-06 19:58:19 EDT 2011-05-07 05:07:53 EDT Why did x end up in DST? How could I prevent it? Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unwanted switch to DST with POSIXct objects
Thanks Jeff and Spencer, I will probably set the time zone for my session, but I had forgotten the possibility of setting the time zone attribute of a POSIXct object, which would have solved my problem also. Denis Le 2011-06-05 à 11:14, Spencer Graves a écrit : On 6/5/2011 9:30 AM, Jeff Newmiller wrote: Sys.setenv(TZ=Etc/GMT+5) Or: x - as.POSIXct(as.Date('2011-01-15')) attr(x, 'tzone') - Etc/GMT+5 x This version works without Sys.setenv, which may not work on some platforms. Unfortunately, I believe there are some copy operations that lose attributes like tzone, so you need to check. For some of the most advanced and complicated time series problems, you might consider what's available from the Rmetrics project, e.g., at https://www.rmetrics.org/ebooks: They are designed to deal with coordinating trading data from financial markets all over the world, each of which affects all the others but have different trading hours. Hope this helps. Spencer Make the timezone you prefer the default for that R session. FWIW: EST may or may not exist as a valid timezone on your system, but it is an ambiguous notation anyway. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Denis Chabotchabot.de...@gmail.com wrote: Hi, For a project I try to keep everything in normal time, not daylight saving time, to prevent problem when instruments collected data during the nights when we go from DST to normal time. But sometimes R tricks me and I do not know how to prevent it. This is one example: lights_on = as.POSIXct(c(2011-05-06 04:09:26, 2011-05-07 04:07:53, 2011-05-08 04:06:21, 2011-05-09 04:04:51, 2011-05-10 04:03:22, 2011-05-11 04:01:55, 2011-05-12 04:00:30, 2011-05-13 03:59:06, 2011-05-14 03:57:45, 2011-05-15 03:56:25, 2011-05-16 03:55:07), tz=EST) # not DST lights_off = as.POSIXct(c(2011-05-05 18:56:54, 2011-05-06 18:58:19, 2011-05-07 18:59:44, 2011-05-08 19:01:08, 2011-05-09 19:02:32, 2011-05-10 19:03:55, 2011-05-11 19:05:18, 2011-05-12 19:06:40, 2011-05-13 19:08:01, 2011-05-14 19:09:22, 2011-05-15 19:10:42 ), tz=EST)# not DST (a = lights_on[c(1,5)]) # not DST [1] 2011-05-06 04:09:26 EST 2011-05-10 04:03:22 EST (b = lights_off[c(2,6)]) # not DST [1] 2011-05-06 18:58:19 EST 2011-05-10 19:03:55 EST (x = c(lights_off[2], lights_on[2])) # suddenly DST [1] 2011-05-06 19:58:19 EDT 2011-05-07 05:07:53 EDT Why did x end up in DST? How could I prevent it? Thanks in advance, Denis _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading fixed width format data with 2 types of lines
Hi, I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large number of old fwf files with 2 types of lines. Lines that begin with 3 in first column carry one set of variables, and lines that begin with 4 carry another set, like this: … 3A00206546L07004901609004599 1015002 001001008010004002004007003 001 3A00206546L07004900609003099 1029001002001001006014002 3A00206546L07004900229000499 1015001001 3A00206546L070049001692559049033 1015 018036024 3A00206546L07004900229000499 1001 002 4A00176546L06804709001011100060651640015001001501063 065914 4A00176546L068047090010111000407616 1092 095614 4A00196546L098000100010111001706214450151062 065914 4A00176546L068047090010111000505913 1062 065914 4A00196546L09800010001011100260472140002001000201042 046114 4A00196546L0980001000101110025042214501200051042 046114 4A00196546L09800010001011100290372140005001220501032 036214 … I have searched for tricks to do this but I must not have used the right keywords, I found nothing. I suppose I could read the entire file as a single character variable for each line, then subset for lines that begin with 3 and save this in an ascii file that will then be reopened with a read.fwf call, and do the same with lines that begin with 4. But this does not appear to me to be very elegant nor efficient… Is there a better method? Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] superfluous distribution found with mclust
Dear R users, I use mclust to fit a mixture of normal distributions to many datasets. Usually the Mclust function finds 1 or two normal distributions, rarely, 3. But I hit a strange case today. my.data - c(57.96920, 51.79415, 51.20538, 55.53637, 51.64291, 56.61476, 51.28855, 55.56169, 51.85113, 54.03330, 51.37370, 49.48561, 52.41580, 53.51176, 60.49293, 55.77012, 51.59270, 56.29660, 55.90048, 53.05432, 50.87498, 58.47613, 54.60827, 54.16143, 52.94914, 58.89408, 51.17116, 54.16909, 51.94852, 53.29897, 57.21962, 66.94420, 56.65536, 53.38147, 52.79163, 52.55879, 55.54395, 54.33984, 51.79235, 52.93464, 50.03343, 59.04797, 51.85276, 53.16419, 53.27404, 60.08775, 52.96493, 54.15129, 58.53050, 51.74431, 50.67817, 51.22570, 57.60541, 51.32998, 56.73625, 55.99371, 50.41035, 52.79797, 59.75973, 52.03613, 56.59133, 51.66319, 51.06316, 55.57699, 50.12779, 56.04503, 55.75857, 57.55347, 51.48167, 52.22395, 54.96204, 59.58895, 55.49020, 50.50893, 49.97572, 53.26222, 57.10047, 51.25523, 52.38768, 56.42965, 51.83258, 55.40537, 51.60564, 54.68883, 53.48098, 58.47231, 70.15088, 51.68805, 52.82636, 52.97804, 51.90228, 53.49184, 52.24366, 52.36895, 53.26520, 52.27327, 50.85403) cl - mclustBIC(my.data) myModel - summary(cl, my.data) Warning message: In map(out$z) : no assignment to 1 I do not know why this happens, but this confirms that a first distribution was found but no data was assigned to it: myModel$classification [1] 3 2 2 3 2 3 2 3 2 2 2 2 2 2 3 3 2 3 3 2 2 3 2 2 2 3 2 2 2 2 3 4 3 2 2 2 3 2 2 2 [41] 2 3 2 2 2 3 2 2 3 2 2 2 3 2 3 3 2 2 3 2 3 2 2 3 2 3 3 3 2 2 3 3 3 2 2 2 3 2 2 3 [81] 2 3 2 2 2 3 4 2 2 2 2 2 2 2 2 2 2 Furthermore, the first and second distributions have almost the same mean: myModel$parameters$mean 1234 52.33903 52.33948 57.14263 68.54754 Graphically, I don't see a reason for the distribution with mean=52.33903 to be there: hist(my.data, breaks=99, freq=F, main=, border=grey(0.5)) rug(my.data, ticksize = 0.01, quiet = TRUE) newx - seq(from = min(my.data), to = max(my.data), length = 500) Dens - dens(modelName = myModel$modelName, data = newx, parameters = myModel$parameters) lines(newx, Dens, col=blue) Do you know why I get this first distribution with no member? Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] alternative to rbind within a loop
Hi Greg, Thanks, very encouraging: with my example, this is 10x more efficient than my loop: utilisateur système écoulé 13.819 5.510 20.204 utilisateur système écoulé 156.206 44.859 202.150 In real life, I did some work on each file before doing rbind. I'll see if this work can be put in a custom-built function that would go into the lapply call you suggested. Denis Le 09-07-23 à 17:27, Greg Snow a écrit : Try something like (untested): mylist - lapply(all.files, function(i) read.csv(i) ) mydf - do.call('rbind', mylist) If all the csv files are conformable that rbind works on them (if the loop method works then that should be the case) then this will read in each file, store the data frames as a list, then rbind them all together. It seems that this should be faster than the loop, but testing will be needed to be sure. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Denis Chabot Sent: Thursday, July 23, 2009 1:54 PM To: list R Subject: [R] alternative to rbind within a loop Hi, I often have to do this: select a folder (directory) containing a few hundred data files in csv format (up to 1000 files, in fact) open each file, transform some character variables in date-tiime format make into a dataframe (involves getting rid of a few variables I don't need concatenate to the master dataframe that will eventually contain the data from all the files in the folder. I use a loop going from 1 to the number of files. I have added a command to print an incrementing number to the R console each time the loop completes one iteration, to judge the speed of the process. At the beginning, 3-4 files are processed each second. After a few hundred iterations it slows down to about 1 file per second. Before I reach the last file (898 in the case at hand), it has become much slower, about 1 file every 2-3 seconds. This progressive slowing down suggests the problem is linked to the size of the growing master dataframe that rbind combines with each new file. In fact, the small script below confirms this as nothing at all happens within the loop but rbind. You can cut the size of this example not to waste to much of your time: # create a dummy data.frame and copy it in a large number of csv files test - file.path(test) a - 1:350 b - rnorm(350,100,10) c - runif(350, 0, 100) d - month.name[runif(350,1,12)] the.data - data.frame(a,b,c,d) for(i in 1:850){ write.csv(the.data, file=paste(test, /file_, i, .csv, sep=)) } # now lets make a single dataframe from all these csv files all.files - list.files(path=test,full.names=T,pattern=.csv) new.data - NULL system.time({ for(i in all.files){ in.data - read.csv(i) if (is.null(new.data)) {new.data = in.data} else {new.data = rbind(new.data, in.data)} cat(paste(i, , , sep=)) } # end for }) # end system.time utilisateur système écoulé 156.206 44.859 202.150 This is with sessionInfo() R version 2.9.1 Patched (2009-07-16 r48939) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doBy_3.7chron_2.3-30timeDate_290.84 loaded via a namespace (and not attached): [1] cluster_1.12.0 grid_2.9.1 Hmisc_3.5-2 lattice_0.17-25 tools_2.9.1 Would it be better to somehow save all 850 files in one dataframe each, and then rbind them all in a single operation? Can I combine all my files without using a loop? I've never quite mastered the apply family of functions but have not seen examples to read files. Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] alternative to rbind within a loop
Hi, I often have to do this: select a folder (directory) containing a few hundred data files in csv format (up to 1000 files, in fact) open each file, transform some character variables in date-tiime format make into a dataframe (involves getting rid of a few variables I don't need concatenate to the master dataframe that will eventually contain the data from all the files in the folder. I use a loop going from 1 to the number of files. I have added a command to print an incrementing number to the R console each time the loop completes one iteration, to judge the speed of the process. At the beginning, 3-4 files are processed each second. After a few hundred iterations it slows down to about 1 file per second. Before I reach the last file (898 in the case at hand), it has become much slower, about 1 file every 2-3 seconds. This progressive slowing down suggests the problem is linked to the size of the growing master dataframe that rbind combines with each new file. In fact, the small script below confirms this as nothing at all happens within the loop but rbind. You can cut the size of this example not to waste to much of your time: # create a dummy data.frame and copy it in a large number of csv files test - file.path(test) a - 1:350 b - rnorm(350,100,10) c - runif(350, 0, 100) d - month.name[runif(350,1,12)] the.data - data.frame(a,b,c,d) for(i in 1:850){ write.csv(the.data, file=paste(test, /file_, i, .csv, sep=)) } # now lets make a single dataframe from all these csv files all.files - list.files(path=test,full.names=T,pattern=.csv) new.data - NULL system.time({ for(i in all.files){ in.data - read.csv(i) if (is.null(new.data)) {new.data = in.data} else {new.data = rbind(new.data, in.data)} cat(paste(i, , , sep=)) } # end for }) # end system.time utilisateur système écoulé 156.206 44.859 202.150 This is with sessionInfo() R version 2.9.1 Patched (2009-07-16 r48939) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doBy_3.7chron_2.3-30timeDate_290.84 loaded via a namespace (and not attached): [1] cluster_1.12.0 grid_2.9.1 Hmisc_3.5-2 lattice_0.17-25 tools_2.9.1 Would it be better to somehow save all 850 files in one dataframe each, and then rbind them all in a single operation? Can I combine all my files without using a loop? I've never quite mastered the apply family of functions but have not seen examples to read files. Thanks in advance, Denis Chabot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with as.POSIXct on dates object
Hi, if you look at my example in recent thread Rép : [R] problem with as.POSIXct and daylight savings time, it appears that tz argument is used by as.POSIX.ct Denis Chabot Le 09-07-20 à 00:00, Remko Duursma a écrit : as.POSIXct.dates does not make use of tz: Ok, but it is supposed to, right? Or maybe the documentation can be updated, because ?as.POSIXct does seem to imply the timezone is used (as it is for other methods of as.POSIXct). thanks, Remko - Remko Duursma Post-Doctoral Fellow Centre for Plants and the Environment University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 On Mon, Jul 20, 2009 at 1:41 PM, Gabor Grothendieckggrothendi...@gmail.com wrote: as.POSIXct.dates does not make use of tz: as.POSIXct.dates function (x, ...) { if (inherits(x, dates)) { z - attr(x, origin) x - as.numeric(x) * 86400 if (length(z) == 3L is.numeric(z)) x - x + as.numeric(ISOdate(z[3L], z[1L], z[2L], 0)) return(structure(x, class = c(POSIXt, POSIXct))) } else stop(gettextf('%s' is not a \dates\ object, deparse(substitute(x } environment: namespace:base On Sun, Jul 19, 2009 at 11:30 PM, Remko Duursmaremkoduur...@gmail.com wrote: Dear R-helpers, I have a problem converting an object made with the 'chron' function to a POSIXct object: # Make date based on DOY dat - chron(dates=232, origin.=c(month=1, day=1, year=2008)) dat #[1] 08/20/08 # Converting to POSIXct uses current timezone (Sydney): as.POSIXct(dat) #[1] 2008-08-20 10:00:00 EST # Setting GMT timezone has no effect? as.POSIXct(dat, tz=GMT) #[1] 2008-08-20 10:00:00 EST # But to POSIXlt works fine: as.POSIXlt(dat, tz=GMT) #[1] 2008-08-20 GMT Is this behavior expected? If so, can you explain why? thanks for your help, Remko - Remko Duursma Post-Doctoral Fellow Centre for Plants and the Environment University of Western Sydney Hawkesbury Campus Richmond NSW 2753 Dept of Biological Science Macquarie University North Ryde NSW 2109 Australia Mobile: +61 (0)422 096908 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with as.POSIXct and daylight savings time
[was [R] end of daylight saving time] Hi, I got no reply with the previous subject line, probably a bad choice of subject on my part, so here it is again. I read from the help on DateTimeClasses and various posts on this list that, quite logically, one needs to specify if DST is active or not when time is between 1 and 2 AM on the first Sunday in November (for North America in recent years). This I can do for on date at a time: a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT) # to get automatic use of DST b - as.POSIXct(2008-11-02 01:30:00, tz=EST) # to tell T this is the second occurrence of 1:30 that day, in ST difftime(b,a) Time difference of 1 hours But why can't I do the following, which appears to be a typical R way of doing things, to handle several date-times at once? c - rep(2008-11-02 01:30:00, 2) tzone = c(EST5EDT, EST) as.POSIXct(c, tz=tzone) Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : valeur 'tz' incorrecte ??? Thanks, Denis Chabot sessionInfo() R version 2.9.1 Patched (2009-07-09 r48929) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with as.POSIXct and daylight savings time
[was [R] end of daylight saving time] Hi, I got no reply with the previous subject line, probably a bad choice of subject on my part, so here it is again. I read from the help on DateTimeClasses and various posts on this list that, quite logically, one needs to specify if DST is active or not when time is between 1 and 2 AM on the first Sunday in November (for North America in recent years). This I can do for on date at a time: a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT) # to get automatic use of DST b - as.POSIXct(2008-11-02 01:30:00, tz=EST) # to tell T this is the second occurrence of 1:30 that day, in ST difftime(b,a) Time difference of 1 hours But why can't I do the following, which appears to be a typical R way of doing things, to handle several date-times at once? c - rep(2008-11-02 01:30:00, 2) tzone = c(EST5EDT, EST) as.POSIXct(c, tz=tzone) Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : valeur 'tz' incorrecte ??? Thanks, Denis Chabot sessionInfo() R version 2.9.1 Patched (2009-07-09 r48929) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with as.POSIXct and daylight savings time
Thank you very much Duncan. I'll follow your suggestion. Why do I want to do what the designer did not think anyone would want to do? I have data acquisition equipment taking measurements every 15 min or so for days at a time, and I need to compile all such experiments in a master data set. The data acquisition equipment automatically switches to DST in spring and back to ST in autumn, which I did not disable because it is easier to work with while we are running the experiments. I could use chron to ignore time zones and daylight savings time, but this would not be of much help as whether or not I use as.POSIXct or chron, there is one day of the year that has 25 h and I need to deal with that 25th hour or I'll lose one hour of data! Denis Le 09-07-19 à 11:45, Duncan Murdoch a écrit : On 19/07/2009 11:23 AM, Denis Chabot wrote: [was [R] end of daylight saving time] Hi, I got no reply with the previous subject line, probably a bad choice of subject on my part, so here it is again. I read from the help on DateTimeClasses and various posts on this list that, quite logically, one needs to specify if DST is active or not when time is between 1 and 2 AM on the first Sunday in November (for North America in recent years). This I can do for on date at a time: a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT) # to get automatic use of DST b - as.POSIXct(2008-11-02 01:30:00, tz=EST) # to tell T this is the second occurrence of 1:30 that day, in ST difftime(b,a) Time difference of 1 hours But why can't I do the following, which appears to be a typical R way of doing things, to handle several date-times at once? c - rep(2008-11-02 01:30:00, 2) tzone = c(EST5EDT, EST) as.POSIXct(c, tz=tzone) Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : valeur 'tz' incorrecte ??? Objects of the POSIXlt and POSIXct classes don't support multiple time zones, so if you specified several time zones on input, how would the conversion functions decide which one to use for output? You'll need to write your own wrapper function to make this decision, and do the conversions separately for each input timezone. Why don't those classes support a separate time zone for each entry? Presumably because their designer never thought anyone would want to do that. Duncan Murdoch Thanks, Denis Chabot sessionInfo() R version 2.9.1 Patched (2009-07-09 r48929) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with as.POSIXct and daylight savings time
Thanks for the suggestion, Spencer. I will take a look and will report to the list if I find this a better solution for my situation. Might take a couple of days though. Denis Le 09-07-19 à 12:42, spencerg a écrit : Have you considered the timeDate package? Spencer Denis Chabot wrote: Thank you very much Duncan. I'll follow your suggestion. Why do I want to do what the designer did not think anyone would want to do? I have data acquisition equipment taking measurements every 15 min or so for days at a time, and I need to compile all such experiments in a master data set. The data acquisition equipment automatically switches to DST in spring and back to ST in autumn, which I did not disable because it is easier to work with while we are running the experiments. I could use chron to ignore time zones and daylight savings time, but this would not be of much help as whether or not I use as.POSIXct or chron, there is one day of the year that has 25 h and I need to deal with that 25th hour or I'll lose one hour of data! Denis Le 09-07-19 à 11:45, Duncan Murdoch a écrit : On 19/07/2009 11:23 AM, Denis Chabot wrote: [was [R] end of daylight saving time] Hi, I got no reply with the previous subject line, probably a bad choice of subject on my part, so here it is again. I read from the help on DateTimeClasses and various posts on this list that, quite logically, one needs to specify if DST is active or not when time is between 1 and 2 AM on the first Sunday in November (for North America in recent years). This I can do for on date at a time: a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT) # to get automatic use of DST b - as.POSIXct(2008-11-02 01:30:00, tz=EST) # to tell T this is the second occurrence of 1:30 that day, in ST difftime(b,a) Time difference of 1 hours But why can't I do the following, which appears to be a typical R way of doing things, to handle several date-times at once? c - rep(2008-11-02 01:30:00, 2) tzone = c(EST5EDT, EST) as.POSIXct(c, tz=tzone) Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : valeur 'tz' incorrecte ??? Objects of the POSIXlt and POSIXct classes don't support multiple time zones, so if you specified several time zones on input, how would the conversion functions decide which one to use for output? You'll need to write your own wrapper function to make this decision, and do the conversions separately for each input timezone. Why don't those classes support a separate time zone for each entry? Presumably because their designer never thought anyone would want to do that. Duncan Murdoch Thanks, Denis Chabot sessionInfo() R version 2.9.1 Patched (2009-07-09 r48929) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] end of daylight saving time
Hi, I read from the help on DateTimeClasses and various posts on this list that, quite logically, one needs to specify if DST is in function for the first hour after the change from DST to ST in autumn. Hence, in my time zone and on Mac OS X, I can do this: a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT) # to get automatic use of DST b - as.POSIXct(2008-11-02 01:30:00, tz=EST) # to tell T this is the second occurrence of 1:30 that day, in ST difftime(b,a) But why can't I do this, to handle several date-times at once? c - rep(2008-11-02 01:30:00, 2) tzone = c(EST5EDT, EST) as.POSIXct(c, tz=tzone) Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : valeur 'tz' incorrecte ??? Thanks, Denis Chabot sessionInfo() R version 2.9.1 Patched (2009-07-09 r48929) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] puzzled by math on date-time objects
Hi Phil, Well thank you very much for this detailed explanation. It will help me when summarizing information over periods of time using either summarize (Hmisc) or summaryBy (doBy). Until now, doing so resulted in mean time for each group being transformed as a number of seconds, as you explain below. But both these functions do not put it back in a POSIX date-time object. I tried to do so by using as.POSIXct() but this failed because I did not provide a reference. From now on I'll try the structure command you used below. Denis Le 09-03-10 à 19:04, Phil Spector a écrit : Denis - If you look inside of summary.POSIXct, you'll see the following: x - summary.default(unclass(object), digits = digits, ...)[1:6] In other words, summary accepts the POSIX object, unclasses it (resulting in a numeric value representing the number of seconds since January 1, 1960), performs the operation, and then reassigns the class. You can do this basic trick yourself. Suppose we have a vector of dates and want the median: dates = as.POSIXct(c('2009-3-15','2009-2-19','2009-3-20','2009-2-18')) median(dates) Error in Summary.POSIXct(c(1235030400, 1237100400), na.rm = FALSE) : 'sum' not defined for POSIXt objects res = median(as.numeric(dates)) structure(res,class='POSIXct') [1] 2009-03-02 23:30:00 PST I think it's clear that you can do any arithmetic operation on dates this way, even if it doesn't make sense: sum(dates) Error in Summary.POSIXct(c(1237100400, 1235030400, 1237532400, 1234944000 : 'sum' not defined for POSIXt objects res = sum(as.numeric(dates)) structure(res,class='POSIXct') [1] 2126-09-08 23:00:00 PDT I'm quite certain that median.POSIXct will be fixed pretty quickly, but you can always unclass and reclass to do what you need. - Phil On Tue, 10 Mar 2009, Denis Chabot wrote: Thanks Phil, but how does summary() finds the median of the same type of object? I would have thought the algorithm used when the vector is even would also require the SUM of the POSIX vector. I am glad of the solution you propose, but still puzzled a bit! Denis Le 09-03-10 à 12:39, Phil Spector a écrit : Denis - There is no median method for POSIX objects, although there is a summary object. Thus, when you pass a POSIX object to median, it uses median.default, which contains the following code: if (n%%2L == 1L) sort(x, partial = half)[half] else sum(sort(x, partial = half + 0L:1L)[half + 0L:1L])/2 So when the length of your POSIX vector is odd, it works, but if it's even, it would need to take the sum of a POSIX object. Of course, there is no sum method for POSIX objects, since it doesn't make sense. Right now, it looks like your best bet for a summary of POSIX objects is summary(a)['Median'] - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 10 Mar 2009, Denis Chabot wrote: Hi, I don't understand the following. When I create a small artificial set of date information in class POSIXct, I can calculate the mean and the median: a = as.POSIXct(Sys.time()) a = a + 60*0:10; a [1] 2009-03-10 11:30:16 EDT 2009-03-10 11:31:16 EDT 2009-03-10 11:32:16 EDT [4] 2009-03-10 11:33:16 EDT 2009-03-10 11:34:16 EDT 2009-03-10 11:35:16 EDT [7] 2009-03-10 11:36:16 EDT 2009-03-10 11:37:16 EDT 2009-03-10 11:38:16 EDT [10] 2009-03-10 11:39:16 EDT 2009-03-10 11:40:16 EDT median(a) [1] 2009-03-10 11:35:16 EDT mean(a) [1] 2009-03-10 11:35:16 EDT But for real data (for this post, a short subset is in object c) that I have converted into a POSIXct object, I cannot calculate the median with median(), though I do get it with summary(): c [1] 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24 14:51:19 EST [4] 2009-02-24 14:51:20 EST 2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST [7] 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST 2009-02-24 14:51:22 EST [10] 2009-02-24 14:51:22 EST class(c) [1] POSIXt POSIXct median(c) Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm = FALSE) : 'sum' not defined for POSIXt objects One difference is that in my own date-time series, some events are repeated (the original data contained fractions of seconds). But then, why can I get a median through summary()? summary(c) Min. 1st Qu. Median 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24 14:51:20 EST Mean 3rd Qu. Max. 2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST Thanks in advance, Denis Chabot sessionInfo() R version 2.8.1 Patched (2009-01-19 r47650) i386-apple-darwin9.6.0 locale
[R] puzzled by math on date-time objects
Hi, I don't understand the following. When I create a small artificial set of date information in class POSIXct, I can calculate the mean and the median: a = as.POSIXct(Sys.time()) a = a + 60*0:10; a [1] 2009-03-10 11:30:16 EDT 2009-03-10 11:31:16 EDT 2009-03-10 11:32:16 EDT [4] 2009-03-10 11:33:16 EDT 2009-03-10 11:34:16 EDT 2009-03-10 11:35:16 EDT [7] 2009-03-10 11:36:16 EDT 2009-03-10 11:37:16 EDT 2009-03-10 11:38:16 EDT [10] 2009-03-10 11:39:16 EDT 2009-03-10 11:40:16 EDT median(a) [1] 2009-03-10 11:35:16 EDT mean(a) [1] 2009-03-10 11:35:16 EDT But for real data (for this post, a short subset is in object c) that I have converted into a POSIXct object, I cannot calculate the median with median(), though I do get it with summary(): c [1] 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24 14:51:19 EST [4] 2009-02-24 14:51:20 EST 2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST [7] 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST 2009-02-24 14:51:22 EST [10] 2009-02-24 14:51:22 EST class(c) [1] POSIXt POSIXct median(c) Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm = FALSE) : 'sum' not defined for POSIXt objects One difference is that in my own date-time series, some events are repeated (the original data contained fractions of seconds). But then, why can I get a median through summary()? summary(c) Min. 1st Qu.Median 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24 14:51:20 EST Mean 3rd Qu. Max. 2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST Thanks in advance, Denis Chabot sessionInfo() R version 2.8.1 Patched (2009-01-19 r47650) i386-apple-darwin9.6.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doBy_3.7 chron_2.3-30 loaded via a namespace (and not attached): [1] Hmisc_3.5-2 cluster_1.11.12 grid_2.8.1 lattice_0.17-20 tools_2.8.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] handling the output of strsplit
Hi, Simple question, but I did not figure out how to find the answer on my own (wrong choice of keywords on my part). I have a character variable for time of day that has entries looking like 6h30, 7h40, 12h25, 23h, etc. For the sake of this message, say h = c(3h30, 6h30, 9h40, 11h25, 14h00, 15h55, 23h) I could not figure out how to use chron to import this into times, so I tried to extract the hours and minutes on my own. I used strsplit and got a list: h2 = strsplit(h, h) h2 [[1]] [1] 3 30 [[2]] [1] 6 30 [[3]] [1] 9 40 [[4]] [1] 11 25 [[5]] [1] 14 00 [[6]] [1] 15 55 [[7]] [1] 23 It is where I am stuck. I would have like to extract a vector of hours from this list, and a vector of minutes, to reconstruct a time of day. But the only command I know, unlist, makes a long vector of h, min, h, min, h, min. For this in particular, but lists in general, how can one extract the first item of each element in the list, then the second item of each element, etc.? Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] handling the output of strsplit
Most helpful Gabor, Many thanks, Denis Le 08-06-20 à 18:58, Gabor Grothendieck a écrit : We construct a times object by replacing the letter h with a : and then pasting a :00 on the end. Then replace any occurrence of :: with :00: . Its now in the format that times recognizes so we can just convert that to times and apply hours() and minutes() to get the components: library(chron) h2 - times(sub(::, :00:, paste(sub(h, :, h), 00, sep = :))) hours(h2) [1] 3 6 9 11 14 15 23 minutes(h2) [1] 30 30 40 25 0 55 0\ Another possibility is to use gsubfn in package gsubfn. It matches the string such that it captures the hour and minutes in the two backreferences and then pastes them together with a :00 at the end. It then replaces :: with :00: and converts that to times. hours() and minutes() could be used, as before, to get the components. library(gsubfn) times(gsubfn(([^h]+)h(.*), ~ sub(::, :00:, paste(..., 00, sep = :)), h, backref = -2)) [1] 03:30:00 06:30:00 09:40:00 11:25:00 14:00:00 15:55:00 23:00:00 Here is another approach using strapply in the gsubfn package. We use the same pattern but this time convert each component to numeric: times(strapply(h, ([^h]+)h(.*), ~ as.numeric(x) / 24 + sum(as.numeric(y), na.rm = TRUE)/(24*60), backref = -2, simplify = c)) [1] 03:30:00 06:30:00 09:40:00 11:25:00 14:00:00 15:55:00 23:00:00 On Fri, Jun 20, 2008 at 6:14 PM, Denis Chabot [EMAIL PROTECTED] wrote: Hi, Simple question, but I did not figure out how to find the answer on my own (wrong choice of keywords on my part). I have a character variable for time of day that has entries looking like 6h30, 7h40, 12h25, 23h, etc. For the sake of this message, say h = c(3h30, 6h30, 9h40, 11h25, 14h00, 15h55, 23h) I could not figure out how to use chron to import this into times, so I tried to extract the hours and minutes on my own. I used strsplit and got a list: h2 = strsplit(h, h) h2 [[1]] [1] 3 30 [[2]] [1] 6 30 [[3]] [1] 9 40 [[4]] [1] 11 25 [[5]] [1] 14 00 [[6]] [1] 15 55 [[7]] [1] 23 It is where I am stuck. I would have like to extract a vector of hours from this list, and a vector of minutes, to reconstruct a time of day. But the only command I know, unlist, makes a long vector of h, min, h, min, h, min. For this in particular, but lists in general, how can one extract the first item of each element in the list, then the second item of each element, etc.? Thanks in advance, Denis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reducing the size of pdf graphics files produced with R
support a range of compression options. I use cups-pdf and reduced an R output file of 3.6 mb to 0.9 mb. Much better if you want to include in a Latex article An alternative on Windows and Linux is GSView and Ghostscript: http://pages.cs.wisc.edu/~ghost/gsview/ Using the convert option (File menu) one can use the pdfwrite driver and set (under properties) CompressPages to TRUE. You can tweak a lot of the PDF/Distiller preferences here as well. G cheers, Paul Chabot Denis wrote: Hi, Without trying to print 100 points (see http:// finzi.psych.upenn.edu/R/Rhelp02a/archive/42105.html ), I often print maps for which I do not want to loose too much of coastline detail, and/or plots with 1000-5000 points (yes, some are on top of each other, but using transparency (i.e. rgb colors with alpha information) this actually comes through as useful information. But the files are large (not as large as in the thread above of course, 800 KB to about 2 MB), especially when included in a LaTeX document by the dozen. Acrobat (not the reader, the full program) has an option reduce file size. I don't know what it does, but it shrinks most of my plots to about 30% or original size, and I cannot detect any loss of detail even when zooming several times. But it is a pain to do this with Acrobat when you generate many plots... And you need to buy Acrobat. Is this something the pdf device could do in a future version? I tried the million points example from the thread above and the 55 MB file was reduced to 6.9 MB, an even better shrinking I see on my usual plots. Denis Chabot __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone:+31302535773 Fax: +31302531145 http://intamap.geo.uu.nl/~paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dates in French format
: library(chron) library(gsubfn) Le chargement a nécessité le package : proto french.months - format(seq(as.Date(2000-01-01), length = 12, by = month), %b) *** caught bus error *** address 0x8, cause 'non-existent physical address' Traceback: 1: strptime(x, f) 2: fromchar(x) 3: as.Date.character(2000-01-01) 4: as.Date(2000-01-01) 5: seq(as.Date(2000-01-01), length = 12, by = month) 6: format(seq(as.Date(2000-01-01), length = 12, by = month), %b) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace However, if I replace that call by this, the rest of Gabor's solution works. library(chron) library(gsubfn) Le chargement a nécessité le package : proto french.months - c(janv, fév, mars, avr, mai, juin, juil, août, sept, oct, nov, déc) dd - c(7-déc-07, 11-déc-07, 14-déc-07, 18-déc-07, 21- déc-07, + 24-déc-07, 26-déc-07, 28-déc-07, 31-déc-07, 2-janv-08, + 4-janv-08, 7-janv-08, 9-janv-08, 11-janv-08, 14-janv-08, + 16-janv-08, 18-janv-08) f - function (d, m, y) chron(paste(pmatch(m, french.months), d, y, sep = /)) strapply(dd, (.*)-(.*)-(.*), f, backref = -3, simplify = c) [1] 12/07/07 12/11/07 12/14/07 12/18/07 12/21/07 12/24/07 12/26/07 12/28/07 [9] 12/31/07 01/02/08 01/04/08 01/07/08 01/09/08 01/11/08 01/14/08 01/16/08 [17] 01/18/08 So thanks again. I will try to reinstall R on my computer and see if I still get these errors. Denis On Jan 30, 2008 11:29 PM, Denis Chabot [EMAIL PROTECTED] wrote: Hello R users, I have to import a file with one column containing dates written in French short format, such as: 7-déc-07 11-déc-07 14-déc-07 18-déc-07 21-déc-07 24-déc-07 26-déc-07 28-déc-07 31-déc-07 2-janv-08 4-janv-08 7-janv-08 9-janv-08 11-janv-08 14-janv-08 16-janv-08 18-janv-08 There are other columns for other (numeric) variables in the data file. In my read.csv2 statement, I indicate that the date column must be imported as.is to keep it as character. I would like to transform this into a date object in R. So far I've used chron for my dates and times needs, but I am willing to change if another object/package will ease the task of importing these dates. My reading of the chron help led me to believe that the formats it understands are only month names in English. Are there other formats I can use with chron, or must I somehow edit this character variables to replace French month names by English ones (or numbers from 1 to 12)? Thanks in advance, Denis p.s. I read this in digest mode, so I'll get your replies faster if you cc to my email __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dates in French format
Hi all, The crashes I reported earlier were cause by R 2.6.1 for Mac not liking the OS date setting french canada, an issue that has been solved (by Simon Urbanek). The crashes did not occur when the OS was set to use normal french formats for dates. With that setting, the suggestions by Prof Ripley and Gabor all worked nicely. Now that my dates are a chron object, I do have a new problem. The formatting of the dates on the x axis leaves to be desired. Instead of having day month and year, or at the very least day and month, I only get month and year so that many tick labels are identical. I also get a warning which puzzles me. For instance: start - chron(12/01/2007) other.dates - seq(1,60,2) Date - start + other.dates plot(1:length(Date)~Date) 6 ticks appear on the x axis. The first three are labeled 12/07 and the other three are labeled 01/08. I also get this: Warning messages: 1: In v[[perm[1]]] : correspondance partielle de 'm' en 'month' 2: In v[[perm[2]]] : correspondance partielle de 'y' en 'year' so there is only partial correspondance between m and month and between y and year. Yet Date here is a proper chron object, so I fail to see why correspondance is only partial. If I do Date2 - as.Date(Date) and use this as my x axis, the six labels are more usable (déc 03, déc 13, déc 23, jan 02, jan 12, jan 22). I suppose I can plot without x labels and draw my own, but I had not expected it would be necessary. sessionInfo() R version 2.6.1 (2007-11-26) i386-apple-darwin8.10.1 locale: fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] zoo_1.4-1chron_2.3-16 loaded via a namespace (and not attached): [1] grid_2.6.1 lattice_0.17-2 tools_2.6.1 Denis Le 31 janv. 08 à 09:46, Denis Chabot a écrit : (I've put the R Mac list in cc because of the crashes I have experienced trying some of the suggestions below) Hi Gabor and Prof Ripley, Le 31 janv. 08 à 02:11, Prof Brian Ripley a écrit : The output from sessionInfo() the posting guide asked for would have been very helpful here. You are right, sorry about that: library(chron) sessionInfo() R version 2.6.1 (2007-11-26) i386-apple-darwin8.10.1 locale: fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] chron_2.3-16 I think the problem is likely to be that these are not standard French abbreviations according to my systems. I was ready to blame Excel for the use of non-standard abbreviations, but I would have been wrong: it seems that janv is a Mac OS X decision from what I can see in my system settings. I am not sure what would be a bullet-proof authority on french abbreviations. My dictionary was of no help, but wikipedia seems to endorse Mac OS X and Windows use of janv: http://fr.wikipedia.org/wiki/Mois#Abr.C3.A9viations On Linux I get format(Sys.Date(), %d-%b-%y) [1] 31-jan-08 format(Sys.Date()-50, %d-%b-%y) [1] 12-déc-07 and on Windows format(Sys.Date(), %d-%b-%y) [1] 31-janv.-08 format(Sys.Date()-50, %d-%b-%y) [1] 12-déc.-07 I tried this too: format(Sys.Date(), %d-%b-%y) [1] 31-jan-08 format(Sys.Date()-50, %d-%b-%y) [1] 12-déc-07 I am lost here: since the OS uses janv, why did the above give jan??? And yes, chron is US-centric and so only allows English names. Assuming you know exactly what is meant by 'French short format', I think the simplest thing to do is to set up a table by tr - month.abb names(tr)[1] - c(janv) # complete it x - 9-janv-08 x2 - strsplit(x, -) x3 - sapply(x2, function(x) {x[2] - tr[x[2]]; paste(x, collapse=-)}) as.Date(x3, format = %d-%b-%y) Thank you Prof Ripley, although I'll have to do my homework to fully understand what is happening with the function you wrote. But I wonder why I cannot make this a Date object: x - 9-janv-08 x2 - strsplit(x, -) x3 - sapply(x2, function(x) {x[2] - tr[x[2]]; paste(x, collapse=-)}) as.Date(x3, format = %d-%b-%y) [1] 2008-01-09 class(x3) [1] character x4 - as.Date(x3, format = %d-%b-%y) *** caught bus error *** address 0x8, cause 'non-existent physical address' Traceback: 1: strptime(x, format) 2: as.Date.character(x3, format = %d-%b-%y) 3: as.Date(x3, format = %d-%b-%y) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace The problem may be my system as I get this error when trying Gabor's suggestions (below). Le 31 janv. 08 à 00:21, Gabor Grothendieck a écrit : Suppose we have: dd - c(7-déc-07, 11-déc-07, 14-déc-07, 18-déc-07, 21- déc-07, 24-déc-07, 26-déc-07, 28-déc-07, 31-déc-07, 2-janv-08, 4-janv-08, 7-janv-08, 9-janv-08, 11-janv-08, 14-janv
[R] dates in French format
Hello R users, I have to import a file with one column containing dates written in French short format, such as: 7-déc-07 11-déc-07 14-déc-07 18-déc-07 21-déc-07 24-déc-07 26-déc-07 28-déc-07 31-déc-07 2-janv-08 4-janv-08 7-janv-08 9-janv-08 11-janv-08 14-janv-08 16-janv-08 18-janv-08 There are other columns for other (numeric) variables in the data file. In my read.csv2 statement, I indicate that the date column must be imported as.is to keep it as character. I would like to transform this into a date object in R. So far I've used chron for my dates and times needs, but I am willing to change if another object/package will ease the task of importing these dates. My reading of the chron help led me to believe that the formats it understands are only month names in English. Are there other formats I can use with chron, or must I somehow edit this character variables to replace French month names by English ones (or numbers from 1 to 12)? Thanks in advance, Denis p.s. I read this in digest mode, so I'll get your replies faster if you cc to my email __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.