There may be a point to consider about the field containing dates in the 
request below. Yes, much code will "work" just fine if the column  are is seen 
as text as you can group by that too. The results will perhaps not be in the 
order by row that you expected but you can do your re-sorting perhaps even more 
efficiently after your summarise() either by converting the fewer remaining 
rows to a form of date or by transforming the text dates into an order of 
year/month/date that then sorts properly in forward or reverse order as needed. 

Converting lots of rows to date is not a cheap process and grouping by that 
more complex date data structure may be harder. Heck, it may even make sense to 
use the text form of dates organized as a factor as the grouping becomes sort 
of pre-done.

The above comments are not saying any other solutions offered are wrong but 
simply discussing whether, especially for larger data sets, there are ways that 
could be more efficient.

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Rui Barradas
Sent: Saturday, November 4, 2023 12:56 PM
To: roslinazairimah zakaria <roslina...@gmail.com>; jim holtman 
<jholt...@gmail.com>
Cc: r-help mailing list <r-help@r-project.org>
Subject: Re: [R] Sum data according to date in sequence

Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:
> Hi all,
> 
> This is the data:
> 
>> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE 
>> #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
> "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
> "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
> "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
> "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
> "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
> "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
> "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
> 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
> 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
> 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
> 20L), class = "data.frame")
> 
> 
> I would like to sum EnergykW data by the date. E.g. all values for
> EnergykWh on 1/14/2016
> 
> 
> On Fri, Nov 3, 2023 at 8:10 AM jim holtman <jholt...@gmail.com> wrote:
> 
>> How about send a 'dput' of some sample data.  My guess is that your date
>> is 'character' and not 'Date'.
>>
>> Thanks
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want to
>> do, not how you want to do it.*
>>
>>
>> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
>> roslina...@gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I have this set of data. I would like to sum the EnergykWh according date
>>> sequences.
>>>
>>>> head(dt1,20)                   StationName      date  time EnergykWh
>>> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
>>> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
>>> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
>>> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
>>> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>>> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
>>> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
>>> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
>>> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
>>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
>>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
>>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
>>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
>>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
>>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
>>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
>>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
>>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
>>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
>>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
>>>
>>> I have tried this:
>>> library(dplyr)
>>> sums <- dt1 %>%
>>>    group_by(date) %>%
>>>    summarise(EnergykWh = sum(EnergykWh))
>>>
>>> head(sums,20)
>>>
>>> The date is not by daily sequence but by year sequence.
>>>
>>>> head(sums,20)# A tibble: 20 × 2
>>>     date      EnergykWh
>>>     <chr>         <dbl> 1 1/1/2017     25.3   2 1/1/2018     61.0   3
>>> 1/1/2019      0.627 4 1/1/2020     10.7   5 1/10/2017    69.4   6
>>> 1/10/2018    54.5   7 1/10/2019    49.1   8 1/10/2020    45.9   9
>>> 1/11/2017    73.9  10 1/11/2018    53.3  11 1/11/2019    93.5  12
>>> 1/11/2020    66.7  13 1/12/2017    78.6  14 1/12/2018    42.2  15
>>> 1/12/2019    22.7  16 1/12/2020    80.9  17 1/13/2017    85.6  18
>>> 1/13/2018    46.4  19 1/13/2019    40.0  20 1/13/2020   121.
>>>
>>>
>>>
>>> Thank you very much for any help given.
>>>
>>>
>>> --
>>> *Roslinazairimah Zakaria*
>>> *Tel: +609-5492370; Fax. No.+609-5492766*
>>>
>>> *Email: roslinazairi...@ump.edu.my <roslinazairi...@ump.edu.my>;
>>> roslina...@gmail.com <roslina...@gmail.com>*
>>> Faculty of Industrial Sciences & Technology
>>> University Malaysia Pahang
>>> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
> 
Hello,

Here are two solutions.

1. Base R

Though I don't coerce the date column to class "Date", it seems to work.


aggregate(EnergykWh ~ date, dt1, sum)
#>        date EnergykWh
#> 1 1/14/2016  11.98569
#> 2 1/15/2016  32.56938
#> 3 1/16/2016  21.29181
#> 4 1/17/2016  22.88083
#> 5 1/18/2016   9.05750


2. Package dplyr.
First column date is coerced from class "character" to class "Date".
Then the grouped sums are computed.


suppressPackageStartupMessages(
   library(dplyr)
)

dt1 %>%
   mutate(date = as.Date(date, "%m/%d/%Y")) %>%
   summarise(EnergykWh = sum(EnergykWh), .by = date)
#>         date EnergykWh
#> 1 2016-01-14  11.98569
#> 2 2016-01-15  32.56938
#> 3 2016-01-16  21.29181
#> 4 2016-01-17  22.88083
#> 5 2016-01-18   9.05750


As you can see, the results are the same.

Also, this exact problem is one of the most asked on StackOverflow. 
Maybe you could try searching there for a solution. My code above is 
also exactly the code in [1], though I had already this answer written. 
I only checked after :(.


[1] 
https://stackoverflow.com/questions/61548758/r-how-sum-values-by-group-by-date


Hope this helps,

Rui Barradas



-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to