Re: [R] mixtools? Fitting two-normal distributions to data where one of the two normal distributions (the one corresponding to lower values of x) is a left-truncated normal distribution.

2015-07-01 Thread Denis Chabot
Hi John,

I don't know how well it will handle your truncated left distribution, but I 
use the function Mclust from package mclust to fit a mixture of normal 
distribution and it works very well. 

Denis
 Le 2015-06-30 à 22:22, John Sorkin jsor...@grecc.umaryland.edu a écrit :
 
 I am trying to model the mixture of two normal distributions, where x values 
 are in the range of zero to some positive value. I know about mixtools and 
 would use it save for the fact that the the y values from the normal 
 distribution corresponding to the lower values of x (i.e. from zero to x/n) 
 are from what appears to be a left-truncated normal distribution (i.e. the y 
 values are all from the upper half of a normal distribution). The y values 
 from higher values of x (i.e. from x/n to x) all appear to come from a normal 
 distribution. Can someone suggest how to fit two normal distributions where 
 one of the two distributions is left-truncated? Can this be done using 
 mixtools?
 Thank you,
 John 
 
 John David Sorkin M.D., Ph.D.
 Professor of Medicine
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology and 
 Geriatric Medicine
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
 
 
 Confidentiality Statement:
 This email message, including any attachments, is for ...{{dropped:12}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] repeated measures: multiple comparisons with pairwise.t.test and multcomp disagree

2015-06-24 Thread Denis Chabot
Thank you, Thierry. And yes, Bert, it turns out that it is more of a 
statistical question after all, but again, since my question used specific R 
functions, R experts are well placed to help me.

As pairewise.t.test was recommended in a few tutorials about repeated-measure 
Anovas, I assumed it took into account the fact that the measures were indeed 
repeated, so thank you for pointing out that it does not.

But my reason for not accepting the result of multcomp went further than this. 
Before deciding to test 4 different durations, I had tested only two of them, 
corresponding to sets 1 and 2 of my example. I used a paired t test (as in t 
test for paired samples). I had a very significant effect, i.e. the mean of the 
differences calculated for each subject was significantly different from zero.

After adding two other durations and switching from my paired t test to a 
repeated measures design, these same 2 sets are no longer different. I think 
the explanation is lack of homogeneity of variances. I thought a log 
transformation of the raw data had been sufficient to fix this, and a Levene 
test on the variances of the 4 sets found no problem in this regard.

But maybe it is the variance of all the possible differences (set 1 vs 2, etc, 
for a total of 6 differences calculated for each subject) that matters.  I just 
calculated these and they range from 1.788502e-05 to 1.462171e-03. A Levene 
test on these 6 groups showed that their variances were heterogeneous.

I think I'll stay away from  the repeated measures followed by multiple 
comparisons and just report my 6 t tests for paired samples, correcting the 
p-level for the number of comparisons with, say, the Sidak method (p for 
significance is then 0.0085).

Thanks for your help. 

Denis

 Le 2015-06-23 à 08:15, Thierry Onkelinx thierry.onkel...@inbo.be a écrit :
 
 Dear Denis,
 
 It's not multcomp which is too conservative, it is the pairwise t-test
 which is too liberal. The pairwise t-test doesn't take the random
 effect of Case into account.
 
 Best regards,
 ir. Thierry Onkelinx
 Instituut voor natuur- en bosonderzoek / Research Institute for Nature
 and Forest
 team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
 Kliniekstraat 25
 1070 Anderlecht
 Belgium
 
 To call in the statistician after the experiment is done may be no
 more than asking him to perform a post-mortem examination: he may be
 able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
 The plural of anecdote is not data. ~ Roger Brinner
 The combination of some data and an aching desire for an answer does
 not ensure that a reasonable answer can be extracted from a given body
 of data. ~ John Tukey
 
 
 2015-06-23 5:17 GMT+02:00 Denis Chabot denis.cha...@me.com:
 Hi,
 
 I am working on a problem which I think can be handled as a repeated 
 measures analysis, and I have read many tutorials about how to do this with 
 R. This part goes well, but I get stuck with the multiple comparisons I'd 
 like to run afterward. I tried two methods that I have seen in my readings, 
 but their results are quite different and I don't know which one to trust.
 
 The two approaches are pairwise.t.test() and multcomp, although the latter 
 is not available after a repeated-measures aov model, but it is after a lme.
 
 I have a physiological variable measured frequently on each of 67 animals. 
 These are then summarized with a quantile for each animal. To check the 
 effect of experiment duration, I recalculated the quantile for each animal 4 
 times, using different subset of the data (so the shortest subset is part of 
 all other subsets, the second subset is included in the 2 others, etc.). I 
 handle this as 4 repeated (non-independent) measurements for each animal, 
 and want to see if the average value (for 67 animals) differs for the 4 
 different durations.
 
 Because animals with high values for this physiological trait have larger 
 differences between the 4 durations than animals with low values, the 
 observations were log transformed.
 
 I attach the small data set (Rda format) here, but it can be obtained here 
 if the attachment gets stripped:
 https://dl.dropboxusercontent.com/u/612902/RepMeasData.Rda
 
 The data.frame is simply called Data.
 My code is
 
 load(RepMeasData.Rda)
 Data_Long = melt(Data, id=Case)
 names(Data_Long) = c(Case,Duration, SMR)
 Data_Long$SMR = log10(Data_Long$SMR)
 
 # I only show essential code to reproduce my opposing results
 mixmod = lme(SMR ~ Duration, data = Data_Long, random = ~ 1 | Case)
 anova(mixmod)
 posthoc - glht(mixmod, linfct = mcp(Duration = Tukey))
 summary(posthoc)
Simultaneous Tests for General Linear Hypotheses
 
 Multiple Comparisons of Means: Tukey Contrasts
 
 
 Fit: lme.formula(fixed = SMR ~ Duration, data = Data_Long, random = ~1 |
   Case)
 
 Linear Hypotheses:
 Estimate Std. Error z value Pr(|z|)
 Set2 - Set1 == 0 -0.006135   0.003375  -1.8180.265
 Set3 - Set1 == 0 -0.002871

[R] repeated measures: multiple comparisons with pairwise.t.test and multcomp disagree

2015-06-22 Thread Denis Chabot
Hi,

I am working on a problem which I think can be handled as a repeated measures 
analysis, and I have read many tutorials about how to do this with R. This part 
goes well, but I get stuck with the multiple comparisons I'd like to run 
afterward. I tried two methods that I have seen in my readings, but their 
results are quite different and I don't know which one to trust.

The two approaches are pairwise.t.test() and multcomp, although the latter is 
not available after a repeated-measures aov model, but it is after a lme. 

I have a physiological variable measured frequently on each of 67 animals. 
These are then summarized with a quantile for each animal. To check the effect 
of experiment duration, I recalculated the quantile for each animal 4 times, 
using different subset of the data (so the shortest subset is part of all other 
subsets, the second subset is included in the 2 others, etc.). I handle this as 
4 repeated (non-independent) measurements for each animal, and want to see if 
the average value (for 67 animals) differs for the 4 different durations.

Because animals with high values for this physiological trait have larger 
differences between the 4 durations than animals with low values, the 
observations were log transformed.

I attach the small data set (Rda format) here, but it can be obtained here if 
the attachment gets stripped:
https://dl.dropboxusercontent.com/u/612902/RepMeasData.Rda

The data.frame is simply called Data.
My code is

load(RepMeasData.Rda)
Data_Long = melt(Data, id=Case)
names(Data_Long) = c(Case,Duration, SMR)
Data_Long$SMR = log10(Data_Long$SMR)

# I only show essential code to reproduce my opposing results
mixmod = lme(SMR ~ Duration, data = Data_Long, random = ~ 1 | Case)
anova(mixmod)
posthoc - glht(mixmod, linfct = mcp(Duration = Tukey))
summary(posthoc)
 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lme.formula(fixed = SMR ~ Duration, data = Data_Long, random = ~1 | 
Case)

Linear Hypotheses:
  Estimate Std. Error z value Pr(|z|)
Set2 - Set1 == 0 -0.006135   0.003375  -1.8180.265
Set3 - Set1 == 0 -0.002871   0.003375  -0.8510.830
Set4 - Set1 == 0  0.015395   0.003375   4.561   1e-04 ***
Set3 - Set2 == 0  0.003264   0.003375   0.9670.768
Set4 - Set2 == 0  0.021530   0.003375   6.379   1e-04 ***
Set4 - Set3 == 0  0.018266   0.003375   5.412   1e-04 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)

with(Data_Long, pairwise.t.test(SMR, Duration, p.adjust.method=holm, 
paired=T))
Pairwise comparisons using paired t tests 

data:  SMR and Duration 

 Set1Set2Set3   
Set2  2e-16 -   -  
Set3 0.8 0.10648 -  
Set4 0.00475 7.9e-05 0.00034

P value adjustment method: holm 

So the difference between sets 1 and 2 goes from non significant to very 
significant, depending on method.

I have other examples with essentially the same type of data and sometimes the 
two approches differ in the opposing way. In the example shown here, multcomp 
was more conservative, in some others it yielded a larger number of significant 
differences.

I admit not mastering all the intricacies of multcomp, but I have used multcomp 
and other methods of doing multiple comparisons many times before (but never 
with a repeated measures design), and always found the results very similar. 
When there were small differences, I trusted multcomp. This time, I get rather 
large differences and I am worried that I am doing something wrong.

Thanks in advance,

Denis Chabot
Fisheries  Oceans Canada

sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.3 (Yosemite)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] multcomp_1.4-0  TH.data_1.0-6   survival_2.38-1 mvtnorm_1.0-2   
nlme_3.1-120car_2.0-25  reshape2_1.4.1 

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.5  magrittr_1.5 splines_3.2.0MASS_7.3-40  
lattice_0.20-31  minqa_1.2.4  stringr_1.0.0   
 [8] plyr_1.8.2   tools_3.2.0  nnet_7.3-9   pbkrtest_0.4-2   
parallel_3.2.0   grid_3.2.0   mgcv_1.8-6  
[15] quantreg_5.11lme4_1.1-7   Matrix_1.2-0 nloptr_1.0.4 
codetools_0.2-11 sandwich_2.3-3   stringi_0.4-1   
[22] SparseM_1.6  zoo_1.7-12
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] puzzled by time zone quirk

2014-09-21 Thread Denis Chabot
Hi,

I have to deal with time-stamped data coming from outside my own time zone, so 
the problem is likely poor knowledge of European time zones on my part. But I 
am puzzled just the same.

I thought that setting a time zone of Europe/Copenhagen would be the same as 
CET in winter and CEST in summer.

This test in winter works as expected:

 a = as.POSIXct(2013-02-25 01:00:00, tz=Europe/Copenhagen); a
[1] 2013-02-25 01:00:00 CET
 b = as.POSIXct(2013-02-25 01:00:00, tz=CET); b
[1] 2013-02-25 01:00:00 CET
 a-b
Time difference of 0 secs

But this one is summer does not work as I expected:

 c = as.POSIXct(2013-07-25 01:00:00, tz=Europe/Copenhagen); c
[1] 2013-07-25 01:00:00 CEST
 d = as.POSIXct(2013-07-25 01:00:00, tz=CEST); d
[1] 2013-07-25 01:00:00 UTC
 e = as.POSIXct(2013-07-25 01:00:00, tz=CET); e
[1] 2013-07-25 01:00:00 CEST
 c-d
Time difference of -2 hours
 c-e
Time difference of 0 secs

Setting tz to Europe/Copenhagen in summer in c first appears to be the same 
as setting it to CEST because the output is showing CEST.

But d should then be the same as c, and it is not.

What is happening?

Thanks in advance,

Denis Chabot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] puzzled by time zone quirk

2014-09-21 Thread Denis Chabot
Sorry, I had not posted in a long time and I remembered this as I pushed the 
send button.

And I am not surprised that I thought wrong!

I'll start with the missing information:

 sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.0.2

Then I'll admit that some of the very useful details you provided had escaped 
me, but in my defense, I took to heart one element found in ?Sys.timezone:

It is not in general possible to retrieve the system's own name(s) for the 
current timezone, but Sys.timezone will retrieve the name it uses for the 
current time (and the name may differ depending on whether daylight saving time 
is in effect).

When I tell my computer that I am in Europe, I get 
Sys.time()
[1] 2014-09-21 16:38:45 CEST

As the output of my c also displayed CEST, I assumed this was the preferred 
way to refer to that time zone. Because of this, I had expected c and d to be 
the same. The output of c is deceiving. But at least I now know not to use 
CEST.

Denis

Le 2014-09-21 à 10:00, Prof Brian Ripley rip...@stats.ox.ac.uk a écrit :

 On 21/09/2014 14:11, Denis Chabot wrote:
 Hi,
 
 I have to deal with time-stamped data coming from outside my own time zone, 
 so the problem is likely poor knowledge of European time zones on my part. 
 But I am puzzled just the same.
 
 I thought that setting a time zone of Europe/Copenhagen would be the same 
 as CET in winter and CEST in summer.
 
 You thought wrong: CEST is not a valid timezone on most (maybe all) R 
 platforms.
 
 You failed to tell us the 'at a minimum' information required by the posting 
 guide.  ?Sys.timezone says OlsonNames() tells you the timezone names 
 supported on your unstated platform, and  ?as.POSIXct says
 
  tz: A time zone specification to be used for the conversion, _if
  one is required_.  System-specific (see time zones), but ‘’
  is the current time zone, and ‘GMT’ is UTC (Universal Time,
  Coordinated). Invalid values are most commonly treated as
  UTC, on some platforms with a warning.
 
 
 As the posting guide asks, please do your own homework.
 
 
 This test in winter works as expected:
 
 a = as.POSIXct(2013-02-25 01:00:00, tz=Europe/Copenhagen); a
 [1] 2013-02-25 01:00:00 CET
 b = as.POSIXct(2013-02-25 01:00:00, tz=CET); b
 [1] 2013-02-25 01:00:00 CET
 a-b
 Time difference of 0 secs
 
 But this one is summer does not work as I expected:
 
 c = as.POSIXct(2013-07-25 01:00:00, tz=Europe/Copenhagen); c
 [1] 2013-07-25 01:00:00 CEST
 d = as.POSIXct(2013-07-25 01:00:00, tz=CEST); d
 [1] 2013-07-25 01:00:00 UTC
 e = as.POSIXct(2013-07-25 01:00:00, tz=CET); e
 [1] 2013-07-25 01:00:00 CEST
 c-d
 Time difference of -2 hours
 c-e
 Time difference of 0 secs
 
 Setting tz to Europe/Copenhagen in summer in c first appears to be the 
 same as setting it to CEST because the output is showing CEST.
 
 But d should then be the same as c, and it is not.
 
 What is happening?
 
 Thanks in advance,
 
 Denis Chabot
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] puzzled by time zone quirk

2014-09-21 Thread Denis Chabot
Hi again,

With the new installation:
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.1.1

I do get a warning that CEST is not a valid time zone, but c is still 
displayed with CEST as time zone, which remains confusing. 

 c = as.POSIXct(2013-07-25 01:00:00, tz=Europe/Copenhagen); c
[1] 2013-07-25 01:00:00 CEST
 d = as.POSIXct(2013-07-25 01:00:00, tz=CEST); d
Messages d'avis :
1: In strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) :
  unknown timezone 'CEST'
2: In as.POSIXct.POSIXlt(x) : unknown timezone 'CEST'
3: In strptime(x, f, tz = tz) : unknown timezone 'CEST'
4: In as.POSIXct.POSIXlt(as.POSIXlt(x, tz, ...), tz, ...) :
  unknown timezone 'CEST'
[1] 2013-07-25 01:00:00 GMT
Message d'avis :
In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'CEST'

It is fine now that I am warned, but I wish CEST did not appear at all.

Denis

Le 2014-09-21 à 10:44, Prof Brian Ripley rip...@stats.ox.ac.uk a écrit :

 You neglected to update before posting as required by the posting guide.
 
 R 3.0.2 is far from current, and on OS X the timezone internals were replaced 
 in R 3.1.x (the previous version did not handle 64-bit time_t correctly, even 
 though that is what OS X uses).  And the documentation is different.
 
 
 ...

 -- 
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Emeritus Professor of Applied Statistics, University of Oxford
 1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting and Dates

2013-05-23 Thread Denis Chabot
Hi,

I am trying to understand why creating Date variables does not work if I subset 
to avoid NAs. 

I had problems creating these Date variables in my code and I thought that the 
presence of NAs was the cause. So I used a condition to avoid NAs.

It turns out that NAs are not a problem and I do not need to subset, but I'd 
like to understand why subsetting causes the problem.
The strange numbers I start with are what I get when I read an Excel sheet with 
the function read.xls() from package gdata.  

dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 41390, 
41397)
dat2 = dat1
dat2[c(5,9)]=NA
Data = data.frame(dat1,dat2)

keep1 = !is.na(Data$dat1)
keep2 = !is.na(Data$dat2)


Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 

Data
dat1  dat2  Dat1a Dat1b  Dat2a Dat2b
1  41327 41327 2013-02-22 15758 2013-02-22 15758
2  41334 41334 2013-03-01 15765 2013-03-01 15765
3  41341 41341 2013-03-08 15772 2013-03-08 15772
4  41348 41348 2013-03-15 15779 2013-03-15 15779
5  41355NA 2013-03-22 15786   NANA
6  41362 41362 2013-03-29 15793 2013-03-29 15793
7  41369 41369 2013-04-05 15800 2013-04-05 15800
8  41376 41376 2013-04-12 15807 2013-04-12 15807
9  41383NA 2013-04-19 15814   NANA
10 41390 41390 2013-04-26 15821 2013-04-26 15821
11 41397 41397 2013-05-03 15828 2013-05-03 15828

So variables Dat1b and Dat2b are not converted to Date class.


sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] gdata_2.12.0

loaded via a namespace (and not attached):
[1] gtools_2.7.0

Thanks in advance,

Denis
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting and Dates

2013-05-23 Thread Denis Chabot
Thank you for the 2 methods to make the columns class Date, but I would really 
like to know why these variables were not in Date class with my code. Do you 
know?

Denis


Le 2013-05-23 à 21:44, arun smartpink...@yahoo.com a écrit :

 You could convert those columns to Date class by:
 
 
 Data[,c(4,6)]-lapply(Data[,c(4,6)],as.Date,origin=1970-01-01)
 #or
 Data[,c(4,6)]-lapply(Data[,c(4,6)],function(x) structure(x,class=Date))
 
 
 #  dat1  dat2  Dat1a  Dat1b  Dat2a  Dat2b
 #1  41327 41327 2013-02-22 2013-02-22 2013-02-22 2013-02-22
 #2  41334 41334 2013-03-01 2013-03-01 2013-03-01 2013-03-01
 #3  41341 41341 2013-03-08 2013-03-08 2013-03-08 2013-03-08
 #4  41348 41348 2013-03-15 2013-03-15 2013-03-15 2013-03-15
 #5  41355NA 2013-03-22 2013-03-22   NA   NA
 #6  41362 41362 2013-03-29 2013-03-29 2013-03-29 2013-03-29
 #7  41369 41369 2013-04-05 2013-04-05 2013-04-05 2013-04-05
 #8  41376 41376 2013-04-12 2013-04-12 2013-04-12 2013-04-12
 #9  41383NA 2013-04-19 2013-04-19   NA   NA
 #10 41390 41390 2013-04-26 2013-04-26 2013-04-26 2013-04-26
 #11 41397 41397 2013-05-03 2013-05-03 2013-05-03 2013-05-03
 A.K.
 
 - Original Message -
 From: Denis Chabot chabot.de...@gmail.com
 To: R-help@r-project.org
 Cc: 
 Sent: Thursday, May 23, 2013 5:35 PM
 Subject: [R] subsetting and Dates
 
 Hi,
 
 I am trying to understand why creating Date variables does not work if I 
 subset to avoid NAs. 
 
 I had problems creating these Date variables in my code and I thought that 
 the presence of NAs was the cause. So I used a condition to avoid NAs.
 
 It turns out that NAs are not a problem and I do not need to subset, but I'd 
 like to understand why subsetting causes the problem.
 The strange numbers I start with are what I get when I read an Excel sheet 
 with the function read.xls() from package gdata.  
 
 dat1 = c(41327, 41334, 41341, 41348, 41355, 41362, 41369, 41376, 41383, 
 41390, 41397)
 dat2 = dat1
 dat2[c(5,9)]=NA
 Data = data.frame(dat1,dat2)
 
 keep1 = !is.na(Data$dat1)
 keep2 = !is.na(Data$dat2)
 
 
 Data$Dat1a = as.Date(Data[,dat1], origin=1899-12-30) 
 Data$Dat1b[keep1] = as.Date(Data[keep1,dat1], origin=1899-12-30) 
 Data$Dat2a = as.Date(Data[,dat2], origin=1899-12-30) 
 Data$Dat2b[keep2] = as.Date(Data[keep2,dat2], origin=1899-12-30) 
 
 Data
 dat1  dat2  Dat1a Dat1b  Dat2a Dat2b
 1  41327 41327 2013-02-22 15758 2013-02-22 15758
 2  41334 41334 2013-03-01 15765 2013-03-01 15765
 3  41341 41341 2013-03-08 15772 2013-03-08 15772
 4  41348 41348 2013-03-15 15779 2013-03-15 15779
 5  41355NA 2013-03-22 15786   NANA
 6  41362 41362 2013-03-29 15793 2013-03-29 15793
 7  41369 41369 2013-04-05 15800 2013-04-05 15800
 8  41376 41376 2013-04-12 15807 2013-04-12 15807
 9  41383NA 2013-04-19 15814   NANA
 10 41390 41390 2013-04-26 15821 2013-04-26 15821
 11 41397 41397 2013-05-03 15828 2013-05-03 15828
 
 So variables Dat1b and Dat2b are not converted to Date class.
 
 
 sessionInfo()
 R version 2.15.2 (2012-10-26)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 other attached packages:
 [1] gdata_2.12.0
 
 loaded via a namespace (and not attached):
 [1] gtools_2.7.0
 
 Thanks in advance,
 
 Denis
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] puzzling Date math result

2012-04-17 Thread Denis Chabot
Hi,

I cannot make a reproducible example easily for my problem, so I'll
describe it as best as I can.

I merged 2 dataframes but was surprised when one line on the x
dataframe did not get a match in the y dataframe, because I knew such
a match existed. There was only one by variable in the merge, in
Date format:

in x:
 $ période: Date, format: 2009-06-09 2009-07-09 ...

in y:
 $ date   : Date, format: 2009-05-12 2009-06-09 …

I extracted the date that did not match into variables a (from x) and
b (from y):

 a=test1$période[21]
 b=test2$date[22]
 a
[1] 2011-04-06
 b
[1] 2011-04-06

and then this very puzzling situation:
a==b
[1] FALSE
 as.integer(a)
[1] 15070
 as.integer(b)
[1] 15070
 as.integer(a)==as.integer(b)
[1] TRUE

Thanks in advance for an explanation or a suggestion to further study
this puzzle,

Denis
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] fr_CA.UTF-8/en_US.UTF-8/fr_CA.UTF-8/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
 [1] doBy_4.5.2   MASS_7.3-17  snow_0.3-8   lme4_0.999375-42
 [5] Matrix_1.0-6 lattice_0.20-6   multcomp_1.2-12  mvtnorm_0.9-9992
 [9] R2HTML_2.2   survival_2.36-12 gdata_2.8.2

loaded via a namespace (and not attached):
[1] grid_2.15.0   gtools_2.6.2  nlme_3.1-103  stats4_2.15.0 tools_2.15.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strange convention for time zone names

2011-08-19 Thread Denis Chabot
Hi,

My time zone in Montreal is Standard time zone:UTC/GMT -5 hours  (see 
http://www.timeanddate.com/worldclock/city.html?n=165).

Yet, in R (POSIXct objects) I must specify the opposite, i.e. UTC+5:

dateMontreal = as.POSIXct(2011-01-15 05:00:00, tz=EST)
dateMontreal2 = as.POSIXct(2011-01-15 05:00:00, tz=UTC+5)
wrongdateMontreal = as.POSIXct(2011-01-15 05:00:00, tz=UTC-5)

dateLondon = as.POSIXct(2011-01-15 10:00:00, tz=UTC0)
difftime(dateMontreal, dateLondon)
Time difference of 0 secs

difftime(dateMontreal2, dateLondon)
Time difference of 0 secs

difftime(wrongdateMontreal, dateLondon)
Time difference of -10 hours

Is there a reason for this counter-intuitive convention?

Denis
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ifelse strips POSIXct class from object

2011-06-05 Thread Denis Chabot
Hi,

I was losing my dates in a script and upon inspection, found that my recent 
switch from separate if and else to ifelse was the cause. But why?

my.date = as.POSIXct(2011-06-04 08:00:00)
default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 
08:00:00), length=15)
x = 4 * 60 * 60
(my.date + x)
(min(default.date) + x)
(new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x) )

(if(!is.na(my.date)) new.date2 = my.date + x  else new.date2= min(default.date) 
+ x )

On my machine, new.date is numeric whereas new.date2 is POSIXct and 
POSIXt, as desired.

sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
  

Thanks in advance,

Denis
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse strips POSIXct class from object

2011-06-05 Thread Denis Chabot
Thanks Duncan, I'll go back to if and else!

Denis
Le 2011-06-05 à 08:39, Duncan Murdoch a écrit :

 On 11-06-05 8:23 AM, Denis Chabot wrote:
 Hi,
 
 I was losing my dates in a script and upon inspection, found that my 
 recent switch from separate if and else to ifelse was the cause. But 
 why?
 
 See ?ifelse.  The class of the result is the same as the class of the test, 
 not the classes of the alternatives.  You need to manually attach the class 
 again, or use a different construction.
 
 Duncan Murdoch
 
 
 my.date  = as.POSIXct(2011-06-04 08:00:00)
 default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 
 08:00:00), length=15)
 x = 4 * 60 * 60
 (my.date + x)
 (min(default.date) + x)
 (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x)  
 )
 
 (if(!is.na(my.date)) new.date2 = my.date + x  else new.date2= 
 min(default.date) + x  )
 
 On my machine, new.date is numeric whereas new.date2 is POSIXct and 
 POSIXt, as desired.
 
 sessionInfo()
 R version 2.13.0 (2011-04-13)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 
 Thanks in advance,
 
 Denis
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse strips POSIXct class from object

2011-06-05 Thread Denis Chabot
I did not know this function, thanks a lot Gabor.

Denis
Le 2011-06-05 à 08:48, Gabor Grothendieck a écrit :

 On Sun, Jun 5, 2011 at 8:23 AM, Denis Chabot chabot.de...@gmail.com wrote:
 Hi,
 
 I was losing my dates in a script and upon inspection, found that my 
 recent switch from separate if and else to ifelse was the cause. But 
 why?
 
 my.date = as.POSIXct(2011-06-04 08:00:00)
 default.date = seq(as.POSIXct(2011-01-01 08:00:00), as.POSIXct(2011-09-01 
 08:00:00), length=15)
 x = 4 * 60 * 60
 (my.date + x)
 (min(default.date) + x)
 (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x) )
 
 
 Try replace:
 
 new.date - replace(my.date, is.na(my.date), min(default.date)) + x
 
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ifelse strips POSIXct class from object

2011-06-05 Thread Denis Chabot
Hi Duncan,

In this case they all had length 1, but I'll be careful at other occasions.

Denis
Le 2011-06-05 à 09:26, Duncan Murdoch a écrit :

 On 11-06-05 8:49 AM, Denis Chabot wrote:
 Thanks Duncan, I'll go back to if and else!
 
 Be careful, it might not give you the same answer.
 
 I'd use this variation on the advice from ?ifelse:
 
 new.date - my.date + x
 new.date[is.na(my.date)] - min(default.date) + x
 
 The thing to watch out for in this construction is that the lengths of the 
 vectors come out right.  I'm assuming that my.date + x is the same length as 
 is.na(my.date)], and that min(default.date) + x is length 1, but I haven't 
 tried your code to check.
 
 Duncan Murdoch
 
 
 Denis
 Le 2011-06-05 à 08:39, Duncan Murdoch a écrit :
 
 On 11-06-05 8:23 AM, Denis Chabot wrote:
 Hi,
 
 I was losing my dates in a script and upon inspection, found that my 
 recent switch from separate if and else to ifelse was the cause. But 
 why?
 
 See ?ifelse.  The class of the result is the same as the class of the test, 
 not the classes of the alternatives.  You need to manually attach the class 
 again, or use a different construction.
 
 Duncan Murdoch
 
 
 my.date= as.POSIXct(2011-06-04 08:00:00)
 default.date = seq(as.POSIXct(2011-01-01 08:00:00), 
 as.POSIXct(2011-09-01 08:00:00), length=15)
 x = 4 * 60 * 60
 (my.date + x)
 (min(default.date) + x)
 (new.date = ifelse(!is.na(my.date), my.date + x, min(default.date) + x)
 )
 
 (if(!is.na(my.date)) new.date2 = my.date + x  else new.date2= 
 min(default.date) + x)
 
 On my machine, new.date is numeric whereas new.date2 is POSIXct and 
 POSIXt, as desired.
 
 sessionInfo()
 R version 2.13.0 (2011-04-13)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 
 Thanks in advance,
 
 Denis
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unwanted switch to DST with POSIXct objects

2011-06-05 Thread Denis Chabot
Hi,

For a project I try to keep everything in normal time, not daylight saving 
time, to prevent problem when instruments collected data during the nights when 
we go from DST to normal time.

But sometimes R tricks me and I do not know how to prevent it.

This is one example:

lights_on = as.POSIXct(c(2011-05-06 04:09:26, 2011-05-07 04:07:53, 
2011-05-08 04:06:21,
2011-05-09 04:04:51, 2011-05-10 04:03:22, 2011-05-11 04:01:55,
2011-05-12 04:00:30, 2011-05-13 03:59:06, 2011-05-14 03:57:45,
2011-05-15 03:56:25, 2011-05-16 03:55:07), tz=EST) # not DST

lights_off = as.POSIXct(c(2011-05-05 18:56:54, 2011-05-06 18:58:19, 
2011-05-07 18:59:44,
2011-05-08 19:01:08, 2011-05-09 19:02:32, 2011-05-10 19:03:55,
2011-05-11 19:05:18, 2011-05-12 19:06:40, 2011-05-13 19:08:01,
2011-05-14 19:09:22, 2011-05-15 19:10:42 ), tz=EST)   # not DST

(a = lights_on[c(1,5)]) # not DST
[1] 2011-05-06 04:09:26 EST 2011-05-10 04:03:22 EST

(b = lights_off[c(2,6)])# not DST
[1] 2011-05-06 18:58:19 EST 2011-05-10 19:03:55 EST

(x = c(lights_off[2], lights_on[2])) # suddenly DST
[1] 2011-05-06 19:58:19 EDT 2011-05-07 05:07:53 EDT

Why did x end up in DST? How could I prevent it?

Thanks in advance,

Denis
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unwanted switch to DST with POSIXct objects

2011-06-05 Thread Denis Chabot
Thanks Jeff and Spencer, I will probably set the time zone for my session, but 
I had forgotten the possibility of setting the time zone attribute of a POSIXct 
object, which would have solved my problem also.

Denis
Le 2011-06-05 à 11:14, Spencer Graves a écrit :

 On 6/5/2011 9:30 AM, Jeff Newmiller wrote:
 Sys.setenv(TZ=Etc/GMT+5)
 
 Or:
 
 x - as.POSIXct(as.Date('2011-01-15'))
 attr(x, 'tzone') - Etc/GMT+5
 x
 
 
  This version works without Sys.setenv, which may not work on some 
 platforms.  Unfortunately, I believe there are some copy operations that lose 
 attributes like tzone, so you need to check.
 
 
  For some of the most advanced and complicated time series problems, you 
 might consider what's available from the Rmetrics project, e.g., at 
 https://www.rmetrics.org/ebooks:  They are designed to deal with 
 coordinating trading data from financial markets all over the world, each of 
 which affects all the others but have different trading hours.
 
 
  Hope this helps.
  Spencer
 
 Make the timezone you prefer the default for that R session.
 
 FWIW: EST may or may not exist as a valid timezone on your system, but it is 
 an ambiguous notation anyway.
 ---
 Jeff Newmiller The . . Go Live...
 DCN:jdnew...@dcn.davis.ca.us  Basics: ##.#. ##.#. Live Go...
 Live: OO#.. Dead: OO#.. Playing
 Research Engineer (Solar/Batteries O.O#. #.O#. with
 /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.
 
 Denis Chabotchabot.de...@gmail.com  wrote:
 
 Hi,
 
 For a project I try to keep everything in normal time, not daylight saving 
 time, to prevent problem when instruments collected data during the nights 
 when we go from DST to normal time.
 
 But sometimes R tricks me and I do not know how to prevent it.
 
 This is one example:
 
 lights_on = as.POSIXct(c(2011-05-06 04:09:26, 2011-05-07 04:07:53, 
 2011-05-08 04:06:21,
 2011-05-09 04:04:51, 2011-05-10 04:03:22, 2011-05-11 04:01:55,
 2011-05-12 04:00:30, 2011-05-13 03:59:06, 2011-05-14 03:57:45,
 2011-05-15 03:56:25, 2011-05-16 03:55:07), tz=EST) # not DST
 
 lights_off = as.POSIXct(c(2011-05-05 18:56:54, 2011-05-06 18:58:19, 
 2011-05-07 18:59:44,
 2011-05-08 19:01:08, 2011-05-09 19:02:32, 2011-05-10 19:03:55,
 2011-05-11 19:05:18, 2011-05-12 19:06:40, 2011-05-13 19:08:01,
 2011-05-14 19:09:22, 2011-05-15 19:10:42 ), tz=EST)# not DST
 
 (a = lights_on[c(1,5)])  # not DST
 [1] 2011-05-06 04:09:26 EST 2011-05-10 04:03:22 EST
 
 (b = lights_off[c(2,6)]) # not DST
 [1] 2011-05-06 18:58:19 EST 2011-05-10 19:03:55 EST
 
 (x = c(lights_off[2], lights_on[2])) # suddenly DST
 [1] 2011-05-06 19:58:19 EDT 2011-05-07 05:07:53 EDT
 
 Why did x end up in DST? How could I prevent it?
 
 Thanks in advance,
 
 Denis
 _
 
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Spencer Graves, PE, PhD
 President and Chief Operating Officer
 Structure Inspection and Monitoring, Inc.
 751 Emerson Ct.
 San José, CA 95126
 ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading fixed width format data with 2 types of lines

2010-08-12 Thread Denis Chabot
Hi,

I know how to read fixed width format data with read.fwf, but suddenly I need 
to read in a large number of old fwf files with 2 types of lines. Lines that 
begin with 3 in first column carry one set of variables, and lines that begin 
with 4 carry another set, like this:

…
3A00206546L07004901609004599  1015002  001001008010004002004007003   001
3A00206546L07004900609003099  1029001002001001006014002 
3A00206546L07004900229000499  1015001001
3A00206546L070049001692559049033  1015 018036024
3A00206546L07004900229000499  1001   002
4A00176546L06804709001011100060651640015001001501063   065914   
4A00176546L068047090010111000407616 1092   095614   
4A00196546L098000100010111001706214450151062   065914   
4A00176546L068047090010111000505913 1062   065914   
4A00196546L09800010001011100260472140002001000201042   046114   
4A00196546L0980001000101110025042214501200051042   046114   
4A00196546L09800010001011100290372140005001220501032   036214   
…

I have searched for tricks to do this but I must not have used the right 
keywords, I found nothing.

I suppose I could read the entire file as a single character variable for each 
line, then subset for lines that begin with 3 and save this in an ascii file 
that will then be reopened with a read.fwf call, and do the same with lines 
that begin with 4. But this does not appear to me to be very elegant nor 
efficient… Is there a better method?

Thanks in advance,


Denis Chabot
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] superfluous distribution found with mclust

2010-03-22 Thread Denis Chabot
Dear R users,

I use mclust to fit a mixture of normal distributions to many datasets. Usually 
the Mclust function finds 1 or two normal distributions, rarely, 3.

But I hit a strange case today.

my.data - c(57.96920, 51.79415, 51.20538, 55.53637, 51.64291, 56.61476, 
51.28855, 55.56169, 51.85113, 54.03330, 51.37370, 49.48561, 52.41580, 53.51176, 
60.49293, 55.77012, 51.59270, 56.29660, 55.90048, 53.05432, 50.87498, 58.47613, 
54.60827, 54.16143, 52.94914, 58.89408, 51.17116, 54.16909, 51.94852, 53.29897, 
57.21962, 66.94420, 56.65536, 53.38147, 52.79163, 52.55879, 55.54395, 54.33984, 
51.79235, 52.93464, 50.03343, 59.04797, 51.85276, 53.16419, 53.27404, 60.08775, 
52.96493, 54.15129, 58.53050, 51.74431, 50.67817, 51.22570, 57.60541, 51.32998, 
56.73625, 55.99371, 50.41035, 52.79797, 59.75973, 52.03613, 56.59133, 51.66319, 
51.06316, 55.57699, 50.12779, 56.04503, 55.75857, 57.55347, 51.48167, 52.22395, 
54.96204, 59.58895, 55.49020, 50.50893, 49.97572, 53.26222, 57.10047, 51.25523, 
52.38768, 56.42965, 51.83258, 55.40537, 51.60564, 54.68883, 53.48098, 58.47231, 
70.15088, 51.68805, 52.82636, 52.97804, 51.90228, 53.49184, 52.24366, 52.36895, 
53.26520, 52.27327, 50.85403)

cl - mclustBIC(my.data)
myModel - summary(cl, my.data)

Warning message:
In map(out$z) : no assignment to 1

I do not know why this happens, but this confirms that a first distribution was 
found but no data was assigned to it:

myModel$classification
 [1] 3 2 2 3 2 3 2 3 2 2 2 2 2 2 3 3 2 3 3 2 2 3 2 2 2 3 2 2 2 2 3 4 3 2 2 2 3 
2 2 2
[41] 2 3 2 2 2 3 2 2 3 2 2 2 3 2 3 3 2 2 3 2 3 2 2 3 2 3 3 3 2 2 3 3 3 2 2 2 3 
2 2 3
[81] 2 3 2 2 2 3 4 2 2 2 2 2 2 2 2 2 2


Furthermore, the first and second distributions have almost the same mean:

myModel$parameters$mean
   1234 
52.33903 52.33948 57.14263 68.54754 



Graphically, I don't see a reason for the distribution with mean=52.33903 to be 
there:


hist(my.data, breaks=99, freq=F, main=, border=grey(0.5))
rug(my.data, ticksize = 0.01, quiet = TRUE)

newx - seq(from = min(my.data), to = max(my.data), length = 500)
Dens - dens(modelName = myModel$modelName, data = newx,
parameters = myModel$parameters)
lines(newx, Dens, col=blue)   


Do you know why I get this first distribution with no member?

Thanks in advance,

Denis Chabot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternative to rbind within a loop

2009-07-24 Thread Denis Chabot

Hi Greg,

Thanks, very encouraging: with my example, this is 10x more efficient  
than my loop:

utilisateur système  écoulé
 13.819   5.510  20.204

utilisateur système  écoulé
156.206  44.859 202.150


In real life, I did some work on each file before doing rbind. I'll  
see if this work can be put in a custom-built function that would go  
into the lapply call you suggested.


Denis

Le 09-07-23 à 17:27, Greg Snow a écrit :


Try something like (untested):


mylist - lapply(all.files, function(i) read.csv(i) )
mydf - do.call('rbind', mylist)


If all the csv files are conformable that rbind works on them (if  
the loop method works then that should be the case) then this will  
read in each file, store the data frames as a list, then rbind them  
all together.


It seems that this should be faster than the loop, but testing will  
be needed to be sure.


Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Denis Chabot
Sent: Thursday, July 23, 2009 1:54 PM
To: list R
Subject: [R] alternative to rbind within a loop

Hi,

I often have to do this:

select a folder (directory) containing a few hundred data files in  
csv

format (up to 1000 files, in fact)

open each file, transform some character variables in date-tiime  
format


make into a dataframe (involves getting rid of a few variables I  
don't

need

concatenate to the master dataframe that will eventually contain the
data from all the files in the folder.

I use a loop going from 1 to the number of files. I have added a
command to print an incrementing number to the R console each time  
the

loop completes one iteration, to judge the speed of the process.

At the beginning, 3-4 files are processed each second. After a few
hundred iterations it slows down to about 1 file per second. Before I
reach the last file (898 in the case at hand), it has become much
slower, about 1 file every 2-3 seconds.

This progressive slowing down suggests the problem is linked to the
size of the growing master dataframe that rbind combines with each
new file.

In fact, the small script below confirms this as nothing at all
happens within the loop but rbind. You can cut the size of this
example not to waste to much of your time:


# create a dummy data.frame and copy it in a large number of csv  
files


test  - file.path(test)

a - 1:350
b - rnorm(350,100,10)
c - runif(350, 0, 100)
d - month.name[runif(350,1,12)]

the.data - data.frame(a,b,c,d)

for(i in 1:850){
write.csv(the.data, file=paste(test, /file_, i, .csv,
sep=))
}

# now lets make a single dataframe from all these csv files

all.files - list.files(path=test,full.names=T,pattern=.csv)

new.data - NULL

system.time({
for(i in all.files){
in.data - read.csv(i)
if (is.null(new.data)) {new.data = in.data} else {new.data =
rbind(new.data, in.data)}
cat(paste(i, , , sep=))
} # end for
}) # end system.time

utilisateur système  écoulé
156.206  44.859 202.150
This is with

sessionInfo()
R version 2.9.1 Patched (2009-07-16 r48939)
x86_64-apple-darwin9.7.0

locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] doBy_3.7chron_2.3-30timeDate_290.84

loaded via a namespace (and not attached):
[1] cluster_1.12.0  grid_2.9.1  Hmisc_3.5-2 lattice_0.17-25
tools_2.9.1


Would it be better to somehow save all 850 files in one dataframe
each, and then rbind them all in a single operation?

Can I combine all my files without using a loop? I've never quite
mastered the apply family of functions but have not seen examples  
to

read files.

Thanks in advance,

Denis Chabot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] alternative to rbind within a loop

2009-07-23 Thread Denis Chabot

Hi,

I often have to do this:

select a folder (directory) containing a few hundred data files in csv  
format (up to 1000 files, in fact)


open each file, transform some character variables in date-tiime format

make into a dataframe (involves getting rid of a few variables I don't  
need


concatenate to the master dataframe that will eventually contain the  
data from all the files in the folder.


I use a loop going from 1 to the number of files. I have added a  
command to print an incrementing number to the R console each time the  
loop completes one iteration, to judge the speed of the process.


At the beginning, 3-4 files are processed each second. After a few  
hundred iterations it slows down to about 1 file per second. Before I  
reach the last file (898 in the case at hand), it has become much  
slower, about 1 file every 2-3 seconds.


This progressive slowing down suggests the problem is linked to the  
size of the growing master dataframe that rbind combines with each  
new file.


In fact, the small script below confirms this as nothing at all  
happens within the loop but rbind. You can cut the size of this  
example not to waste to much of your time:



# create a dummy data.frame and copy it in a large number of csv files

test  - file.path(test)

a - 1:350
b - rnorm(350,100,10)
c - runif(350, 0, 100)
d - month.name[runif(350,1,12)]

the.data - data.frame(a,b,c,d)

for(i in 1:850){
write.csv(the.data, file=paste(test, /file_, i, .csv, sep=))
}

# now lets make a single dataframe from all these csv files

all.files - list.files(path=test,full.names=T,pattern=.csv)

new.data - NULL

system.time({
for(i in all.files){
in.data - read.csv(i)
	if (is.null(new.data)) {new.data = in.data} else {new.data =  
rbind(new.data, in.data)}

cat(paste(i, , , sep=))
} # end for
}) # end system.time

utilisateur système  écoulé
156.206  44.859 202.150
This is with

sessionInfo()
R version 2.9.1 Patched (2009-07-16 r48939)
x86_64-apple-darwin9.7.0

locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] doBy_3.7chron_2.3-30timeDate_290.84

loaded via a namespace (and not attached):
[1] cluster_1.12.0  grid_2.9.1  Hmisc_3.5-2 lattice_0.17-25  
tools_2.9.1



Would it be better to somehow save all 850 files in one dataframe  
each, and then rbind them all in a single operation?


Can I combine all my files without using a loop? I've never quite  
mastered the apply family of functions but have not seen examples to  
read files.


Thanks in advance,

Denis Chabot

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with as.POSIXct on dates object

2009-07-20 Thread Denis Chabot

Hi,

if you look at my example in recent thread Rép : [R] problem with  
as.POSIXct and daylight savings time, it appears that tz argument is  
used by as.POSIX.ct


Denis Chabot
Le 09-07-20 à 00:00, Remko Duursma a écrit :


as.POSIXct.dates does not make use of tz:


Ok, but it is supposed to, right? Or maybe the documentation can be
updated, because ?as.POSIXct does seem to imply the timezone is used
(as it is for other methods of as.POSIXct).

thanks,
Remko




-
Remko Duursma
Post-Doctoral Fellow

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908



On Mon, Jul 20, 2009 at 1:41 PM, Gabor
Grothendieckggrothendi...@gmail.com wrote:

as.POSIXct.dates does not make use of tz:


as.POSIXct.dates

function (x, ...)
{
   if (inherits(x, dates)) {
   z - attr(x, origin)
   x - as.numeric(x) * 86400
   if (length(z) == 3L  is.numeric(z))
   x - x + as.numeric(ISOdate(z[3L], z[1L], z[2L],
   0))
   return(structure(x, class = c(POSIXt, POSIXct)))
   }
   else stop(gettextf('%s' is not a \dates\ object,
deparse(substitute(x
}
environment: namespace:base


On Sun, Jul 19, 2009 at 11:30 PM, Remko Duursmaremkoduur...@gmail.com 
 wrote:

Dear R-helpers,


I have a problem converting an object made with the 'chron' function
to a POSIXct object:

# Make date based on DOY
dat - chron(dates=232, origin.=c(month=1, day=1, year=2008))

dat
#[1] 08/20/08

# Converting to POSIXct uses current timezone (Sydney):
as.POSIXct(dat)
#[1] 2008-08-20 10:00:00 EST

# Setting GMT timezone has no effect?
as.POSIXct(dat, tz=GMT)
#[1] 2008-08-20 10:00:00 EST

# But to POSIXlt works fine:
as.POSIXlt(dat, tz=GMT)
#[1] 2008-08-20 GMT

Is this behavior expected? If so, can you explain why?

thanks for your help,
Remko



-
Remko Duursma
Post-Doctoral Fellow

Centre for Plants and the Environment
University of Western Sydney
Hawkesbury Campus
Richmond NSW 2753

Dept of Biological Science
Macquarie University
North Ryde NSW 2109
Australia

Mobile: +61 (0)422 096908

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with as.POSIXct and daylight savings time

2009-07-19 Thread Denis Chabot
[was   [R] end of daylight saving time]

Hi,

I got no reply with the previous subject line, probably a bad choice  
of subject on my part, so here it is again.

I read from the help on DateTimeClasses and various posts on this list  
that, quite logically, one needs to specify if DST is active or not  
when time is between 1 and 2 AM on the first Sunday in November (for  
North America in recent years).

This I can do for on date at a time:

a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT)  # to get  
automatic use of DST
b - as.POSIXct(2008-11-02 01:30:00, tz=EST)  # to tell T this is  
the second occurrence of 1:30 that day, in ST
difftime(b,a)

Time difference of 1 hours

But why can't I do the following, which appears to be a typical R way  
of doing things, to handle several date-times at once?

c - rep(2008-11-02 01:30:00, 2)
tzone = c(EST5EDT, EST)

as.POSIXct(c, tz=tzone)
Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) :
  valeur 'tz' incorrecte

???

Thanks,

Denis Chabot

sessionInfo()
R version 2.9.1 Patched (2009-07-09 r48929)
x86_64-apple-darwin9.7.0

locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.9.1

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with as.POSIXct and daylight savings time

2009-07-19 Thread Denis Chabot

[was  [R] end of daylight saving time]

Hi,

I got no reply with the previous subject line, probably a bad choice  
of subject on my part, so here it is again.


I read from the help on DateTimeClasses and various posts on this list  
that, quite logically, one needs to specify if DST is active or not  
when time is between 1 and 2 AM on the first Sunday in November (for  
North America in recent years).


This I can do for on date at a time:

a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT)  # to get  
automatic use of DST
b - as.POSIXct(2008-11-02 01:30:00, tz=EST)  # to tell T this is  
the second occurrence of 1:30 that day, in ST

difftime(b,a)

Time difference of 1 hours

But why can't I do the following, which appears to be a typical R way  
of doing things, to handle several date-times at once?


c - rep(2008-11-02 01:30:00, 2)
tzone = c(EST5EDT, EST)

as.POSIXct(c, tz=tzone)
Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) :
 valeur 'tz' incorrecte

???

Thanks,

Denis Chabot

sessionInfo()
R version 2.9.1 Patched (2009-07-09 r48929)
x86_64-apple-darwin9.7.0

locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.9.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with as.POSIXct and daylight savings time

2009-07-19 Thread Denis Chabot

Thank you very much Duncan.

I'll follow your suggestion.

Why do I want to do what the designer did not think anyone would want  
to do? I have data acquisition equipment taking measurements every 15  
min or so for days at a time, and I need to compile all such  
experiments in a master data set. The data acquisition equipment  
automatically switches to DST in spring and back to ST in autumn,  
which I did not disable because it is easier to work with while we are  
running the experiments.


I could use chron to ignore time zones and daylight savings time, but  
this would not be of much help as whether or not I use as.POSIXct or  
chron, there is one day of the year that has 25 h and I need to deal  
with that 25th hour or I'll lose one hour of data!


Denis
Le 09-07-19 à 11:45, Duncan Murdoch a écrit :


On 19/07/2009 11:23 AM, Denis Chabot wrote:

[was  [R] end of daylight saving time]
Hi,
I got no reply with the previous subject line, probably a bad  
choice  of subject on my part, so here it is again.
I read from the help on DateTimeClasses and various posts on this  
list  that, quite logically, one needs to specify if DST is active  
or not  when time is between 1 and 2 AM on the first Sunday in  
November (for  North America in recent years).

This I can do for on date at a time:
a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT)  # to get   
automatic use of DST
b - as.POSIXct(2008-11-02 01:30:00, tz=EST)  # to tell T this  
is  the second occurrence of 1:30 that day, in ST

difftime(b,a)
Time difference of 1 hours
But why can't I do the following, which appears to be a typical R  
way  of doing things, to handle several date-times at once?

c - rep(2008-11-02 01:30:00, 2)
tzone = c(EST5EDT, EST)
as.POSIXct(c, tz=tzone)
Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) :
 valeur 'tz' incorrecte
???


Objects of the POSIXlt and POSIXct classes don't support multiple  
time zones, so if you specified several time zones on input, how  
would the conversion functions decide which one to use for output?   
You'll need to write your own wrapper function to make this  
decision, and do the conversions separately for each input timezone.


Why don't those classes support a separate time zone for each entry?  
Presumably because their designer never thought anyone would want to  
do that.


Duncan Murdoch



Thanks,
Denis Chabot
sessionInfo()
R version 2.9.1 Patched (2009-07-09 r48929)
x86_64-apple-darwin9.7.0
locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
loaded via a namespace (and not attached):
[1] tools_2.9.1
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with as.POSIXct and daylight savings time

2009-07-19 Thread Denis Chabot
Thanks for the suggestion, Spencer. I will take a look and will report  
to the list if I find this a better solution for my situation. Might  
take a couple of days though.


Denis
Le 09-07-19 à 12:42, spencerg a écrit :


Have you considered the timeDate package?
Spencer

Denis Chabot wrote:

Thank you very much Duncan.

I'll follow your suggestion.

Why do I want to do what the designer did not think anyone would  
want to do? I have data acquisition equipment taking measurements  
every 15 min or so for days at a time, and I need to compile all  
such experiments in a master data set. The data acquisition  
equipment automatically switches to DST in spring and back to ST in  
autumn, which I did not disable because it is easier to work with  
while we are running the experiments.


I could use chron to ignore time zones and daylight savings time,  
but this would not be of much help as whether or not I use  
as.POSIXct or chron, there is one day of the year that has 25 h and  
I need to deal with that 25th hour or I'll lose one hour of data!


Denis
Le 09-07-19 à 11:45, Duncan Murdoch a écrit :


On 19/07/2009 11:23 AM, Denis Chabot wrote:

[was [R] end of daylight saving time]
Hi,
I got no reply with the previous subject line, probably a bad  
choice  of subject on my part, so here it is again.
I read from the help on DateTimeClasses and various posts on this  
list  that, quite logically, one needs to specify if DST is  
active or not  when time is between 1 and 2 AM on the first  
Sunday in November (for  North America in recent years).

This I can do for on date at a time:
a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT)  # to get   
automatic use of DST
b - as.POSIXct(2008-11-02 01:30:00, tz=EST)  # to tell T  
this is  the second occurrence of 1:30 that day, in ST

difftime(b,a)
Time difference of 1 hours
But why can't I do the following, which appears to be a typical R  
way  of doing things, to handle several date-times at once?

c - rep(2008-11-02 01:30:00, 2)
tzone = c(EST5EDT, EST)
as.POSIXct(c, tz=tzone)
Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) :
valeur 'tz' incorrecte
???


Objects of the POSIXlt and POSIXct classes don't support multiple  
time zones, so if you specified several time zones on input, how  
would the conversion functions decide which one to use for  
output?  You'll need to write your own wrapper function to make  
this decision, and do the conversions separately for each input  
timezone.


Why don't those classes support a separate time zone for each  
entry? Presumably because their designer never thought anyone  
would want to do that.


Duncan Murdoch



Thanks,
Denis Chabot
sessionInfo()
R version 2.9.1 Patched (2009-07-09 r48929)
x86_64-apple-darwin9.7.0
locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8
attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

loaded via a namespace (and not attached):
[1] tools_2.9.1
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] end of daylight saving time

2009-07-15 Thread Denis Chabot

Hi,

I read from the help on DateTimeClasses and various posts on this list  
that, quite logically, one needs to specify if DST is in function for  
the first hour  after the change from DST to ST in autumn.


Hence, in my time zone and on Mac OS X, I can do this:

a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT)  # to get  
automatic use of DST
b - as.POSIXct(2008-11-02 01:30:00, tz=EST)  # to tell T this is  
the second occurrence of 1:30 that day, in ST

difftime(b,a)

But why can't I do this, to handle several date-times at once?

c - rep(2008-11-02 01:30:00, 2)
tzone = c(EST5EDT, EST)

as.POSIXct(c, tz=tzone)
Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) :
  valeur 'tz' incorrecte

???

Thanks,

Denis Chabot

sessionInfo()
R version 2.9.1 Patched (2009-07-09 r48929)
x86_64-apple-darwin9.7.0

locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.9.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] puzzled by math on date-time objects

2009-03-11 Thread Denis Chabot

Hi Phil,

Well thank you very much for this detailed explanation. It will help  
me when summarizing information over periods of time using either  
summarize (Hmisc) or summaryBy (doBy). Until now, doing so resulted in  
mean time for each group being transformed as a number of seconds,  
as you explain below. But both these functions do not put it back in a  
POSIX date-time object. I tried to do so by using as.POSIXct() but  
this failed because I did not provide a reference. From now on I'll  
try the structure command you used below.


Denis
Le 09-03-10 à 19:04, Phil Spector a écrit :


Denis -
  If you look inside of summary.POSIXct, you'll see the
following:

x - summary.default(unclass(object), digits = digits, ...)[1:6]

In other words, summary accepts the POSIX object, unclasses it
(resulting in a numeric value representing the number of seconds
since January 1, 1960), performs the operation, and then reassigns
the class.  You can do this basic trick yourself.  Suppose we have a  
vector of dates and want the median:


dates =  
as.POSIXct(c('2009-3-15','2009-2-19','2009-3-20','2009-2-18'))

median(dates)

Error in Summary.POSIXct(c(1235030400, 1237100400), na.rm = FALSE) :
 'sum' not defined for POSIXt objects

res = median(as.numeric(dates))
structure(res,class='POSIXct')

[1] 2009-03-02 23:30:00 PST

  I think it's clear that you can do any arithmetic operation on
dates this way, even if it doesn't make sense:


sum(dates)

Error in Summary.POSIXct(c(1237100400, 1235030400, 1237532400,
1234944000 :
 'sum' not defined for POSIXt objects

res = sum(as.numeric(dates))
structure(res,class='POSIXct')

[1] 2126-09-08 23:00:00 PDT

  I'm quite certain that median.POSIXct will be fixed pretty quickly,
but you can always unclass and reclass to do what you need.

- Phil






On Tue, 10 Mar 2009, Denis Chabot wrote:


Thanks Phil,

but how does summary() finds the median of the same type of object?  
I would have thought the algorithm used when the vector is even  
would  also require the SUM of the POSIX vector. I am glad of the  
solution you propose, but still puzzled a bit!


Denis
Le 09-03-10 à 12:39, Phil Spector a écrit :


Denis -
There is no median method for POSIX objects, although
there is a summary object.  Thus, when you pass a POSIX
object to median, it uses median.default, which contains
the following code:

 if (n%%2L == 1L)
sort(x, partial = half)[half]
 else sum(sort(x, partial = half + 0L:1L)[half + 0L:1L])/2
So when the length of your POSIX vector is odd, it works, but if  
it's even, it would need to take  the sum of a POSIX
object.  Of course, there is no sum method for POSIX objects,  
since it doesn't make sense.

Right now, it looks like your best bet for a summary of POSIX
objects is
summary(a)['Median']

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu
On Tue, 10 Mar 2009, Denis Chabot wrote:

Hi,
I don't understand the following. When I create a small  
artificial set of date information in class POSIXct, I can  
calculate the mean and the median:

a = as.POSIXct(Sys.time())
a = a + 60*0:10; a
[1] 2009-03-10 11:30:16 EDT 2009-03-10 11:31:16 EDT  
2009-03-10 11:32:16 EDT
[4] 2009-03-10 11:33:16 EDT 2009-03-10 11:34:16 EDT  
2009-03-10 11:35:16 EDT
[7] 2009-03-10 11:36:16 EDT 2009-03-10 11:37:16 EDT  
2009-03-10 11:38:16 EDT

[10] 2009-03-10 11:39:16 EDT 2009-03-10 11:40:16 EDT
median(a)
[1] 2009-03-10 11:35:16 EDT
mean(a)
[1] 2009-03-10 11:35:16 EDT
But for real data (for this post, a short subset is in object c)   
that I have converted into a POSIXct object, I cannot calculate  
the median with median(), though I do get it with summary():

c
[1] 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST  
2009-02-24 14:51:19 EST
[4] 2009-02-24 14:51:20 EST 2009-02-24 14:51:20 EST  
2009-02-24 14:51:21 EST
[7] 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST  
2009-02-24 14:51:22 EST

[10] 2009-02-24 14:51:22 EST
class(c)
[1] POSIXt  POSIXct
median(c)
Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm  
= FALSE) :

'sum' not defined for POSIXt objects
One difference is that in my own date-time series, some events  
are repeated (the original data contained fractions of seconds).  
But then, why can I get a median through summary()?

summary(c)
Min.   1st Qu. 
Median
2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24  
14:51:20 EST
Mean   3rd  
Qu.  Max.
2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST 2009-02-24  
14:51:22 EST

Thanks in advance,
Denis Chabot
sessionInfo()
R version 2.8.1 Patched (2009-01-19 r47650)
i386-apple-darwin9.6.0
locale

[R] puzzled by math on date-time objects

2009-03-10 Thread Denis Chabot

Hi,

I don't understand the following. When I create a small artificial set  
of date information in class POSIXct, I can calculate the mean and the  
median:


a = as.POSIXct(Sys.time())
a = a + 60*0:10; a

 [1] 2009-03-10 11:30:16 EDT 2009-03-10 11:31:16 EDT 2009-03-10  
11:32:16 EDT
 [4] 2009-03-10 11:33:16 EDT 2009-03-10 11:34:16 EDT 2009-03-10  
11:35:16 EDT
 [7] 2009-03-10 11:36:16 EDT 2009-03-10 11:37:16 EDT 2009-03-10  
11:38:16 EDT

[10] 2009-03-10 11:39:16 EDT 2009-03-10 11:40:16 EDT

median(a)
[1] 2009-03-10 11:35:16 EDT
mean(a)
[1] 2009-03-10 11:35:16 EDT


But for real data (for this post, a short subset is in object c)  that  
I have converted into a POSIXct object, I cannot calculate the median  
with median(), though I do get it with summary():


c
 [1] 2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24  
14:51:19 EST
 [4] 2009-02-24 14:51:20 EST 2009-02-24 14:51:20 EST 2009-02-24  
14:51:21 EST
 [7] 2009-02-24 14:51:21 EST 2009-02-24 14:51:22 EST 2009-02-24  
14:51:22 EST

[10] 2009-02-24 14:51:22 EST

class(c)
[1] POSIXt  POSIXct

median(c)
Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm =  
FALSE) :

  'sum' not defined for POSIXt objects

One difference is that in my own date-time series, some events are  
repeated (the original data contained fractions of seconds). But then,  
why can I get a median through summary()?


summary(c)
 Min.   1st  
Qu.Median
2009-02-24 14:51:18 EST 2009-02-24 14:51:19 EST 2009-02-24  
14:51:20 EST
 Mean   3rd  
Qu.  Max.
2009-02-24 14:51:20 EST 2009-02-24 14:51:21 EST 2009-02-24  
14:51:22 EST


Thanks in advance,


Denis Chabot

sessionInfo()
R version 2.8.1 Patched (2009-01-19 r47650)
i386-apple-darwin9.6.0

locale:
fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] doBy_3.7 chron_2.3-30

loaded via a namespace (and not attached):
[1] Hmisc_3.5-2 cluster_1.11.12 grid_2.8.1  lattice_0.17-20  
tools_2.8.1


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] handling the output of strsplit

2008-06-20 Thread Denis Chabot

Hi,

Simple question, but I did not figure out how to find the answer on my  
own (wrong choice of keywords on my part).


I have a character variable for time of day that has entries looking  
like 6h30, 7h40, 12h25, 23h, etc. For the sake of this  
message, say


h = c(3h30,  6h30,  9h40,  11h25, 14h00,  
15h55,  23h)


I could not figure out how to use chron to import this into times, so  
I tried to extract the hours and minutes on my own.


I used strsplit and got a list:

h2 = strsplit(h, h)
 h2
[[1]]
[1] 3  30

[[2]]
[1] 6  30

[[3]]
[1] 9  40

[[4]]
[1] 11 25

[[5]]
[1] 14 00

[[6]]
[1] 15 55

[[7]]
[1] 23

It is where I am stuck. I would have like to extract a vector of  
hours from this list, and a vector of minutes, to reconstruct a  
time of day.


But the only command I know, unlist, makes a long vector of h, min, h,  
min, h, min.


For this in particular, but lists in general, how can one extract the  
first item of each element in the list, then the second item of each  
element, etc.?


Thanks in advance,

Denis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] handling the output of strsplit

2008-06-20 Thread Denis Chabot

Most helpful Gabor,

Many thanks,

Denis
Le 08-06-20 à 18:58, Gabor Grothendieck a écrit :


We construct a times object by replacing the letter h with
a : and then pasting a :00 on the end.  Then replace any occurrence
of :: with :00: .  Its now in the format that times recognizes so we  
can

just convert that to times and apply hours() and minutes() to get
the components:


library(chron)
h2 - times(sub(::, :00:, paste(sub(h, :, h), 00, sep =  
:)))

hours(h2)

[1]  3  6  9 11 14 15 23

minutes(h2)

[1] 30 30 40 25  0 55  0\

Another possibility is to use gsubfn in package gsubfn.  It matches  
the
string such that it captures the hour and minutes in the two  
backreferences
and then pastes them together with a :00 at the end.   It then  
replaces
:: with :00: and converts that to times.   hours() and minutes()  
could be used,

as before, to get the components.


library(gsubfn)
times(gsubfn(([^h]+)h(.*), ~ sub(::, :00:, paste(..., 00,  
sep = :)), h, backref = -2))

[1] 03:30:00 06:30:00 09:40:00 11:25:00 14:00:00 15:55:00 23:00:00

Here is another approach using strapply in the gsubfn package.  We  
use the

same pattern but this time convert each component to numeric:

times(strapply(h, ([^h]+)h(.*), ~ as.numeric(x) / 24 +  
sum(as.numeric(y), na.rm = TRUE)/(24*60), backref = -2, simplify =  
c))

[1] 03:30:00 06:30:00 09:40:00 11:25:00 14:00:00 15:55:00 23:00:00



On Fri, Jun 20, 2008 at 6:14 PM, Denis Chabot [EMAIL PROTECTED] 
 wrote:

Hi,

Simple question, but I did not figure out how to find the answer on  
my own

(wrong choice of keywords on my part).

I have a character variable for time of day that has entries  
looking like
6h30, 7h40, 12h25, 23h, etc. For the sake of this message,  
say


h = c(3h30,  6h30,  9h40,  11h25, 14h00,
15h55,  23h)

I could not figure out how to use chron to import this into times,  
so I

tried to extract the hours and minutes on my own.

I used strsplit and got a list:

h2 = strsplit(h, h)

h2

[[1]]
[1] 3  30

[[2]]
[1] 6  30

[[3]]
[1] 9  40

[[4]]
[1] 11 25

[[5]]
[1] 14 00

[[6]]
[1] 15 55

[[7]]
[1] 23

It is where I am stuck. I would have like to extract a vector of  
hours
from this list, and a vector of minutes, to reconstruct a time of  
day.


But the only command I know, unlist, makes a long vector of h, min,  
h, min,

h, min.

For this in particular, but lists in general, how can one extract  
the first
item of each element in the list, then the second item of each  
element,

etc.?

Thanks in advance,

Denis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reducing the size of pdf graphics files produced with R

2008-02-07 Thread Denis Chabot
  
 support a range of compression options. I use cups-pdf and reduced  
 an R output file of 3.6 mb to 0.9 mb. Much better if you want to  
 include in a Latex article


 An alternative on Windows and Linux is GSView and Ghostscript:

 http://pages.cs.wisc.edu/~ghost/gsview/

 Using the convert option (File menu) one can use the pdfwrite  
 driver and
 set (under properties) CompressPages to TRUE. You can tweak a lot  
 of the
 PDF/Distiller preferences here as well.

 G


 cheers,
 Paul

 Chabot Denis wrote:

 Hi,

 Without trying to print 100 points (see http:// 
 finzi.psych.upenn.edu/R/Rhelp02a/archive/42105.html 
 ), I often print  maps for which I do not want to loose too much  
 of coastline detail,  and/or plots with 1000-5000 points (yes,  
 some are on top of each  other, but using transparency (i.e. rgb  
 colors with alpha  information) this actually comes through as  
 useful information.

 But the files are large (not as large as in the thread above of   
 course, 800 KB to about 2 MB), especially when included in a  
 LaTeX  document by the dozen.

 Acrobat (not the reader, the full program) has an option reduce  
 file  size. I don't know what it does, but it shrinks most of my  
 plots to  about 30% or original size, and I cannot detect any  
 loss of detail  even when zooming several times. But it is a pain  
 to do this with  Acrobat when you generate many plots... And you  
 need to buy Acrobat.

 Is this something the pdf device could do in a future version? I   
 tried the million points example from the thread above and the  
 55  MB file was reduced to 6.9 MB, an even better shrinking I see  
 on my  usual plots.


 Denis Chabot

 __
 [EMAIL PROTECTED] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 -- 
 Drs. Paul Hiemstra
 Department of Physical Geography
 Faculty of Geosciences
 University of Utrecht
 Heidelberglaan 2
 P.O. Box 80.115
 3508 TC Utrecht
 Phone:+31302535773
 Fax:  +31302531145
 http://intamap.geo.uu.nl/~paul



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dates in French format

2008-01-31 Thread Denis Chabot
:

  library(chron)
  library(gsubfn)
Le chargement a nécessité le package : proto
  french.months - format(seq(as.Date(2000-01-01), length = 12, by  
= month), %b)

  *** caught bus error ***
address 0x8, cause 'non-existent physical address'

Traceback:
  1: strptime(x, f)
  2: fromchar(x)
  3: as.Date.character(2000-01-01)
  4: as.Date(2000-01-01)
  5: seq(as.Date(2000-01-01), length = 12, by = month)
  6: format(seq(as.Date(2000-01-01), length = 12, by = month),  
%b)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

However, if I replace that call by this, the rest of Gabor's solution  
works.

  library(chron)
  library(gsubfn)
Le chargement a nécessité le package : proto
  french.months - c(janv, fév, mars, avr, mai, juin,  
juil, août, sept, oct, nov, déc)
  dd - c(7-déc-07, 11-déc-07, 14-déc-07, 18-déc-07, 21- 
déc-07,
+ 24-déc-07, 26-déc-07, 28-déc-07, 31-déc-07, 2-janv-08,
+ 4-janv-08, 7-janv-08, 9-janv-08, 11-janv-08, 14-janv-08,
+ 16-janv-08, 18-janv-08)
  f - function (d, m, y) chron(paste(pmatch(m, french.months), d, y,  
sep = /))
  strapply(dd, (.*)-(.*)-(.*), f, backref = -3, simplify = c)
  [1] 12/07/07 12/11/07 12/14/07 12/18/07 12/21/07 12/24/07 12/26/07  
12/28/07
  [9] 12/31/07 01/02/08 01/04/08 01/07/08 01/09/08 01/11/08 01/14/08  
01/16/08
[17] 01/18/08

So thanks again. I will try to reinstall R on my computer and see if I  
still get these errors.


Denis




 On Jan 30, 2008 11:29 PM, Denis Chabot [EMAIL PROTECTED]  
 wrote:
 Hello R users,

 I have to import a file with one column containing dates written in
 French short format, such as:

   7-déc-07
  11-déc-07
  14-déc-07
  18-déc-07
  21-déc-07
  24-déc-07
  26-déc-07
  28-déc-07
  31-déc-07
  2-janv-08
  4-janv-08
  7-janv-08
  9-janv-08
 11-janv-08
 14-janv-08
 16-janv-08
 18-janv-08

 There are other columns for other (numeric) variables in the data
 file. In my read.csv2 statement, I indicate that the date column must
 be imported as.is to keep it as character.

 I would like to transform this into a date object in R. So far I've
 used chron for my dates and times needs, but I am willing to change  
 if
 another object/package will ease the task of importing these dates.

 My reading of the chron help led me to believe that the formats it
 understands are only month names in English.

 Are there other formats I can use with chron, or must I somehow  
 edit
 this character variables to replace French month names by English  
 ones
 (or numbers from 1 to 12)?

 Thanks in advance,

 Denis
 p.s. I read this in digest mode, so I'll get your replies faster if
 you cc to my email

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dates in French format

2008-01-31 Thread Denis Chabot
Hi all,

The crashes I reported earlier were cause by R 2.6.1 for Mac not  
liking the OS date setting french canada, an issue that has been  
solved (by Simon Urbanek). The crashes did not occur when the OS was  
set to use normal french formats for dates. With that setting, the  
suggestions by Prof Ripley and Gabor all worked nicely.

Now that my dates are a chron object, I do have a new problem. The  
formatting of the dates on the x axis leaves to be desired. Instead of  
having day month and year, or at the very least day and month, I only  
get month and year so that many tick labels are identical. I also get  
a warning which puzzles me.

For instance:
  start - chron(12/01/2007)
  other.dates - seq(1,60,2)
  Date - start + other.dates
  plot(1:length(Date)~Date)

6 ticks appear on the x axis. The first three are labeled 12/07 and  
the other three are labeled 01/08. I also get this:

Warning messages:
1: In v[[perm[1]]] : correspondance partielle de 'm' en 'month'
2: In v[[perm[2]]] : correspondance partielle de 'y' en 'year'

so there is only partial correspondance between m and month and  
between y and year. Yet Date here is a proper chron object, so I  
fail to see why correspondance is only partial.

If I do Date2 - as.Date(Date) and use this as my x axis, the six  
labels are more usable (déc 03, déc 13, déc 23, jan 02, jan 12, jan 22).

I suppose I can plot without x labels and draw my own, but I had not  
expected it would be necessary.

  sessionInfo()
R version 2.6.1 (2007-11-26)
i386-apple-darwin8.10.1

locale:
fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] zoo_1.4-1chron_2.3-16

loaded via a namespace (and not attached):
[1] grid_2.6.1 lattice_0.17-2 tools_2.6.1


Denis

Le 31 janv. 08 à 09:46, Denis Chabot a écrit :

 (I've put the R Mac list in cc because of the crashes I have  
 experienced trying some of the suggestions below)

 Hi Gabor and Prof Ripley,

 Le 31 janv. 08 à 02:11, Prof Brian Ripley a écrit :

 The output from sessionInfo() the posting guide asked for would  
 have been very helpful here.

 You are right, sorry about that:


  library(chron)
  sessionInfo()
 R version 2.6.1 (2007-11-26)
 i386-apple-darwin8.10.1

 locale:
 fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] chron_2.3-16




 I think the problem is likely to be that these are not standard  
 French
 abbreviations according to my systems.

 I was ready to blame Excel for the use of non-standard  
 abbreviations, but I would have been wrong: it seems that janv is  
 a Mac OS X decision from what I can see in my system settings. I am  
 not sure what would be a bullet-proof authority on french  
 abbreviations. My dictionary was of no help, but wikipedia seems to  
 endorse Mac OS X and Windows use of janv:

 http://fr.wikipedia.org/wiki/Mois#Abr.C3.A9viations

 On Linux I get

 format(Sys.Date(), %d-%b-%y)
 [1] 31-jan-08
 format(Sys.Date()-50, %d-%b-%y)
 [1] 12-déc-07

 and on Windows

 format(Sys.Date(), %d-%b-%y)
 [1] 31-janv.-08

 format(Sys.Date()-50, %d-%b-%y)
 [1] 12-déc.-07

 I tried this too:
  format(Sys.Date(), %d-%b-%y)
 [1] 31-jan-08
  format(Sys.Date()-50, %d-%b-%y)
 [1] 12-déc-07

 I am lost here: since the OS uses janv, why did the above give  
 jan???



 And yes, chron is US-centric and so only allows English names.

 Assuming you know exactly what is meant by 'French short format', I  
 think the simplest thing to do is to set up a table by

 tr - month.abb
 names(tr)[1] - c(janv)  # complete it

 x - 9-janv-08
 x2 - strsplit(x, -)
 x3 - sapply(x2, function(x) {x[2] - tr[x[2]]; paste(x,  
 collapse=-)})
 as.Date(x3, format = %d-%b-%y)

 Thank you Prof Ripley, although I'll have to do my homework to fully  
 understand what is happening with the function you wrote.

 But I wonder why I cannot make this a Date object:

  x - 9-janv-08
  x2 - strsplit(x, -)
  x3 - sapply(x2, function(x) {x[2] - tr[x[2]]; paste(x,  
 collapse=-)})
  as.Date(x3, format = %d-%b-%y)
 [1] 2008-01-09
  class(x3)
 [1] character
  x4 - as.Date(x3, format = %d-%b-%y)

 *** caught bus error ***
 address 0x8, cause 'non-existent physical address'

 Traceback:
 1: strptime(x, format)
 2: as.Date.character(x3, format = %d-%b-%y)
 3: as.Date(x3, format = %d-%b-%y)

 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace

 The problem may be my system as I get this error when trying Gabor's  
 suggestions (below).

 Le 31 janv. 08 à 00:21, Gabor Grothendieck a écrit :
 Suppose we have:

 dd - c(7-déc-07, 11-déc-07, 14-déc-07, 18-déc-07, 21- 
 déc-07,
 24-déc-07, 26-déc-07, 28-déc-07, 31-déc-07, 2-janv-08,
 4-janv-08, 7-janv-08, 9-janv-08, 11-janv-08, 14-janv

[R] dates in French format

2008-01-30 Thread Denis Chabot
Hello R users,

I have to import a file with one column containing dates written in  
French short format, such as:

7-déc-07
   11-déc-07
   14-déc-07
   18-déc-07
   21-déc-07
   24-déc-07
   26-déc-07
   28-déc-07
   31-déc-07
   2-janv-08
   4-janv-08
   7-janv-08
   9-janv-08
  11-janv-08
  14-janv-08
  16-janv-08
  18-janv-08

There are other columns for other (numeric) variables in the data  
file. In my read.csv2 statement, I indicate that the date column must  
be imported as.is to keep it as character.

I would like to transform this into a date object in R. So far I've  
used chron for my dates and times needs, but I am willing to change if  
another object/package will ease the task of importing these dates.

My reading of the chron help led me to believe that the formats it  
understands are only month names in English.

Are there other formats I can use with chron, or must I somehow edit  
this character variables to replace French month names by English ones  
(or numbers from 1 to 12)?

Thanks in advance,

Denis
p.s. I read this in digest mode, so I'll get your replies faster if  
you cc to my email
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.