Re: [R] Create new data frame with conditional sums
Sorry, misstatements. It should (of course) read: If one makes the reasonable assumption that Pct is much larger than Cutoff, sorting Pct is the expensive part e.g O(nlog2(n) for Quicksort (n = length Pct). I believe looping is O(n^2). etc. On Mon, Oct 16, 2023 at 7:48 AM Bert Gunter wrote: > > If one makes the reasonable assumption that Pct is much larger than > Cutoff, sorting Cutoff is the expensive part e.g O(nlog2(n) for > Quicksort (n = length Cutoff). I believe looping is O(n^2). Jeff's > approach using findInterval may be faster. Of course implementation > details matter. > > -- Bert > > On Mon, Oct 16, 2023 at 4:41 AM Leonard Mada wrote: > > > > Dear Jason, > > > > The code could look something like: > > > > dummyData = data.frame(Tract=seq(1, 10, by=1), > > Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03), > > Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800)) > > > > # Define the cutoffs > > # - allow for duplicate entries; > > by = 0.03; # by = 0.01; > > cutoffs <- seq(0, 0.20, by = by) > > > > # Create a new column with cutoffs > > dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs, > > labels = cutoffs[-1], ordered_result = TRUE) > > > > # Sort data > > # - we could actually order only the columns: > > # Totpop & Cutoff; > > dummyData = dummyData[order(dummyData$Cutoff), ] > > > > # Result > > cs = cumsum(dummyData$Totpop) > > > > # Only last entry: > > # - I do not have a nice one-liner, but this should do it: > > isLast = rev(! duplicated(rev(dummyData$Cutoff))) > > > > data.frame(Total = cs[isLast], > > Cutoff = dummyData$Cutoff[isLast]) > > > > > > Sincerely, > > > > Leonard > > > > > > On 10/15/2023 7:41 PM, Leonard Mada wrote: > > > Dear Jason, > > > > > > > > > I do not think that the solution based on aggregate offered by GPT was > > > correct. That quasi-solution only aggregates for every individual level. > > > > > > > > > As I understand, you want the cumulative sum. The idea was proposed by > > > Bert; you need only to sort first based on the cutoff (e.g. using an > > > ordered factor). And then only extract the last value for each level. > > > If Pct is unique, than you can skip this last step and use directly > > > the cumsum (but on the sorted data set). > > > > > > > > > Alternatives: see the solutions with loops or with sapply. > > > > > > > > > Sincerely, > > > > > > > > > Leonard > > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ynt: creating a time series
Then your data has extra data points... either duplicates or records with timestamps not on 15min intervals. On October 16, 2023 7:29:25 AM PDT, "ahmet varlı" wrote: >hello, > >because ı have data between these times and it has 177647 elements > >Gönderen: Marc Girondot via R-help adına R-help > >Gönderildi: 16 Ekim 2023 Pazartesi 13:43 >Kime: r-help@r-project.org >Konu: Re: [R] creating a time series > >Why did you expect to have 177647 elements ? > >I found that 177642 is the correct number: > >Marc > >baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") >bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # >zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) > >y2017_11_02 <- seq(from=as.POSIXct("2017-11-02 13:30:00", tz = "CET"), >to=as.POSIXct("2017-11-02 23:45:00", tz = "CET"), by = 60 * 15) ># Length 42 - OK >length(y2017_11_02) >y2017_11_12 <- seq(from=as.POSIXct("2017-11-03 00:00:00", tz = "CET"), >to=as.POSIXct("2017-12-31 23:45:00", tz = "CET"), by = 60 * 15) ># ((30-2)+31)*24*4=5664 - OK >length(y2017_11_12) >y2018 <- seq(from=as.POSIXct("2018-01-01 00:00:00", tz = "CET"), >to=as.POSIXct("2018-12-31 23:45:00", tz = "CET"), by = 60 * 15) ># (365)*24*4=35040 - OK >length(y2018) >y2019 <- seq(from=as.POSIXct("2019-01-01 00:00:00", tz = "CET"), >to=as.POSIXct("2019-12-31 23:45:00", tz = "CET"), by = 60 * 15) ># (365)*24*4=35040 - OK >length(y2019) >y2020 <- seq(from=as.POSIXct("2020-01-01 00:00:00", tz = "CET"), >to=as.POSIXct("2020-12-31 23:45:00", tz = "CET"), by = 60 * 15) ># (366)*24*4=35136 - OK >length(y2020) >y2021 <- seq(from=as.POSIXct("2021-01-01 00:00:00", tz = "CET"), >to=as.POSIXct("2021-12-31 23:45:00", tz = "CET"), by = 60 * 15) ># (365)*24*4=35040 - OK >length(y2021) >y2022 <- seq(from=as.POSIXct("2022-01-01 00:00:00", tz = "CET"), >to=as.POSIXct("2022-11-26 23:45:00", tz = "CET"), by = 60 * 15) ># (365-31-4)*24*4=31680 - OK >length(y2022) > >length(y2017_11_02)+length(y2017_11_12)+length(y2018)+length(y2019)+length(y2020)+length(y2021)+length(y2022) >length(zaman_seti) > > >Le 16/10/2023 à 12:12, ahmet varlı a écrit : >> Hello everyone, >> >> � had 15 minutes of data from 2017-11-02 13:30:00 to 2022-11-26 23:45:00 >> and number of data is 177647 >> >> � would like to ask why my time series are less then my expectation. >> >> >> baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") >> bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # >> zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) >> >> >> length(zaman_seti) >> [1] 177642 >> >> but it has to be 177647 >> >> >> >> and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- >> HH:MM:SS) >> >> su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = >> "%Y-%m-%d %H:%M:%S") >> >> I am using this code to change the format but it gives result as Na >> >> How can � solve this problem? >> >> Bests, >> >> >> >> >> >>[[alternative HTML version deleted]] >> >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create new data frame with conditional sums
If one makes the reasonable assumption that Pct is much larger than Cutoff, sorting Cutoff is the expensive part e.g O(nlog2(n) for Quicksort (n = length Cutoff). I believe looping is O(n^2). Jeff's approach using findInterval may be faster. Of course implementation details matter. -- Bert On Mon, Oct 16, 2023 at 4:41 AM Leonard Mada wrote: > > Dear Jason, > > The code could look something like: > > dummyData = data.frame(Tract=seq(1, 10, by=1), > Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03), > Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800)) > > # Define the cutoffs > # - allow for duplicate entries; > by = 0.03; # by = 0.01; > cutoffs <- seq(0, 0.20, by = by) > > # Create a new column with cutoffs > dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs, > labels = cutoffs[-1], ordered_result = TRUE) > > # Sort data > # - we could actually order only the columns: > # Totpop & Cutoff; > dummyData = dummyData[order(dummyData$Cutoff), ] > > # Result > cs = cumsum(dummyData$Totpop) > > # Only last entry: > # - I do not have a nice one-liner, but this should do it: > isLast = rev(! duplicated(rev(dummyData$Cutoff))) > > data.frame(Total = cs[isLast], > Cutoff = dummyData$Cutoff[isLast]) > > > Sincerely, > > Leonard > > > On 10/15/2023 7:41 PM, Leonard Mada wrote: > > Dear Jason, > > > > > > I do not think that the solution based on aggregate offered by GPT was > > correct. That quasi-solution only aggregates for every individual level. > > > > > > As I understand, you want the cumulative sum. The idea was proposed by > > Bert; you need only to sort first based on the cutoff (e.g. using an > > ordered factor). And then only extract the last value for each level. > > If Pct is unique, than you can skip this last step and use directly > > the cumsum (but on the sorted data set). > > > > > > Alternatives: see the solutions with loops or with sapply. > > > > > > Sincerely, > > > > > > Leonard > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ynt: creating a time series
hello, because ı have data between these times and it has 177647 elements Gönderen: Marc Girondot via R-help adına R-help Gönderildi: 16 Ekim 2023 Pazartesi 13:43 Kime: r-help@r-project.org Konu: Re: [R] creating a time series Why did you expect to have 177647 elements ? I found that 177642 is the correct number: Marc baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) y2017_11_02 <- seq(from=as.POSIXct("2017-11-02 13:30:00", tz = "CET"), to=as.POSIXct("2017-11-02 23:45:00", tz = "CET"), by = 60 * 15) # Length 42 - OK length(y2017_11_02) y2017_11_12 <- seq(from=as.POSIXct("2017-11-03 00:00:00", tz = "CET"), to=as.POSIXct("2017-12-31 23:45:00", tz = "CET"), by = 60 * 15) # ((30-2)+31)*24*4=5664 - OK length(y2017_11_12) y2018 <- seq(from=as.POSIXct("2018-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2018-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (365)*24*4=35040 - OK length(y2018) y2019 <- seq(from=as.POSIXct("2019-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2019-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (365)*24*4=35040 - OK length(y2019) y2020 <- seq(from=as.POSIXct("2020-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2020-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (366)*24*4=35136 - OK length(y2020) y2021 <- seq(from=as.POSIXct("2021-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2021-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (365)*24*4=35040 - OK length(y2021) y2022 <- seq(from=as.POSIXct("2022-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2022-11-26 23:45:00", tz = "CET"), by = 60 * 15) # (365-31-4)*24*4=31680 - OK length(y2022) length(y2017_11_02)+length(y2017_11_12)+length(y2018)+length(y2019)+length(y2020)+length(y2021)+length(y2022) length(zaman_seti) Le 16/10/2023 à 12:12, ahmet varlı a écrit : > Hello everyone, > > � had 15 minutes of data from 2017-11-02 13:30:00 to 2022-11-26 23:45:00 and > number of data is 177647 > > � would like to ask why my time series are less then my expectation. > > > baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") > bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # > zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) > > > length(zaman_seti) > [1] 177642 > > but it has to be 177647 > > > > and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- > HH:MM:SS) > > su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = > "%Y-%m-%d %H:%M:%S") > > I am using this code to change the format but it gives result as Na > > How can � solve this problem? > > Bests, > > > > > >[[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create new data frame with conditional sums
Dear Jason, The code could look something like: dummyData = data.frame(Tract=seq(1, 10, by=1), Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03), Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800)) # Define the cutoffs # - allow for duplicate entries; by = 0.03; # by = 0.01; cutoffs <- seq(0, 0.20, by = by) # Create a new column with cutoffs dummyData$Cutoff <- cut(dummyData$Pct, breaks = cutoffs, labels = cutoffs[-1], ordered_result = TRUE) # Sort data # - we could actually order only the columns: # Totpop & Cutoff; dummyData = dummyData[order(dummyData$Cutoff), ] # Result cs = cumsum(dummyData$Totpop) # Only last entry: # - I do not have a nice one-liner, but this should do it: isLast = rev(! duplicated(rev(dummyData$Cutoff))) data.frame(Total = cs[isLast], Cutoff = dummyData$Cutoff[isLast]) Sincerely, Leonard On 10/15/2023 7:41 PM, Leonard Mada wrote: Dear Jason, I do not think that the solution based on aggregate offered by GPT was correct. That quasi-solution only aggregates for every individual level. As I understand, you want the cumulative sum. The idea was proposed by Bert; you need only to sort first based on the cutoff (e.g. using an ordered factor). And then only extract the last value for each level. If Pct is unique, than you can skip this last step and use directly the cumsum (but on the sorted data set). Alternatives: see the solutions with loops or with sapply. Sincerely, Leonard __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a time series
Why did you expect to have 177647 elements ? I found that 177642 is the correct number: Marc baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) y2017_11_02 <- seq(from=as.POSIXct("2017-11-02 13:30:00", tz = "CET"), to=as.POSIXct("2017-11-02 23:45:00", tz = "CET"), by = 60 * 15) # Length 42 - OK length(y2017_11_02) y2017_11_12 <- seq(from=as.POSIXct("2017-11-03 00:00:00", tz = "CET"), to=as.POSIXct("2017-12-31 23:45:00", tz = "CET"), by = 60 * 15) # ((30-2)+31)*24*4=5664 - OK length(y2017_11_12) y2018 <- seq(from=as.POSIXct("2018-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2018-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (365)*24*4=35040 - OK length(y2018) y2019 <- seq(from=as.POSIXct("2019-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2019-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (365)*24*4=35040 - OK length(y2019) y2020 <- seq(from=as.POSIXct("2020-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2020-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (366)*24*4=35136 - OK length(y2020) y2021 <- seq(from=as.POSIXct("2021-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2021-12-31 23:45:00", tz = "CET"), by = 60 * 15) # (365)*24*4=35040 - OK length(y2021) y2022 <- seq(from=as.POSIXct("2022-01-01 00:00:00", tz = "CET"), to=as.POSIXct("2022-11-26 23:45:00", tz = "CET"), by = 60 * 15) # (365-31-4)*24*4=31680 - OK length(y2022) length(y2017_11_02)+length(y2017_11_12)+length(y2018)+length(y2019)+length(y2020)+length(y2021)+length(y2022) length(zaman_seti) Le 16/10/2023 à 12:12, ahmet varlı a écrit : Hello everyone, � had 15 minutes of data from 2017-11-02 13:30:00 to 2022-11-26 23:45:00 and number of data is 177647 � would like to ask why my time series are less then my expectation. baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) length(zaman_seti) [1] 177642 but it has to be 177647 and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- HH:MM:SS) su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%Y-%m-%d %H:%M:%S") I am using this code to change the format but it gives result as Na How can � solve this problem? Bests, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a time series
Às 11:12 de 16/10/2023, ahmet varlı escreveu: Hello everyone, � had 15 minutes of data from 2017-11-02 13:30:00 to 2022-11-26 23:45:00 and number of data is 177647 � would like to ask why my time series are less then my expectation. baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) length(zaman_seti) [1] 177642 but it has to be 177647 and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- HH:MM:SS) su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%Y-%m-%d %H:%M:%S") I am using this code to change the format but it gives result as Na How can � solve this problem? Bests, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Given your date format, try format = "%d.%m.%Y %H:%M" Test with your date time: x <- "2.11.2017 13:30" as.POSIXct(x, format = "%d.%m.%Y %H:%M") #> [1] "2017-11-02 13:30:00 WET" as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%d.%m.%Y %H:%M") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating a time series
Hello everyone, � had 15 minutes of data from 2017-11-02 13:30:00 to 2022-11-26 23:45:00 and number of data is 177647 � would like to ask why my time series are less then my expectation. baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) length(zaman_seti) [1] 177642 but it has to be 177647 and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- HH:MM:SS) su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%Y-%m-%d %H:%M:%S") I am using this code to change the format but it gives result as Na How can � solve this problem? Bests, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.