Rich,

I reproduced your problem on my re-arranging the code the mailer mangled. I 
tried variations like not using pipes or changing what it is grouped by and 
they all show your results on the abbreviated data with the error:

`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.

I think I fixed summarise()  but it makes me wonder if there is an 
inconsistency introduced along the way as what you used is supposed to work and 
has worked for me in the past.

I note the man page for summarise() mentions that the .groups="..." is 
experimental and a tad confusing:

I changed your code to this by telling it to keep the grouping in the output 
the same:

vel_by_month = vel %>%
  group_by(year, month) %>%
  summarise(flow = mean(fps, na.rm = TRUE), .groups="keep")

The change from your code is the addition at the very end of the .groups="keep" 
argument.

Since I used your limited data, this is all I get:

> vel_by_month
# A tibble: 1 x 3
# Groups:   year, month [1]
year month  flow
<int> <int> <dbl>
  1  2016     3  1.77

For now, all I did was shut summarise() up.

Not having the rest of your data, the question is where your NA and Nan are 
introduced. If the change I made above does not resolve it, then as others 
suggested, you begin by looking at your data more carefully perhaps starting 
with the .CSV file and then the data structures in R, along the lines of what 
you were shown. I find the table() function useful for categorical data with 
limited choices as it would spit out the anomaly as happening once.

I see your point about needing fresh eyes. My eyes do not see what you did 
wrong but am just following clues you may be ignoring.


-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Rich Shepard
Sent: Tuesday, September 14, 2021 11:21 AM
To: r-help@r-project.org
Subject: [R] Need fresh eyes to see what I'm missing

The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81

The script to process it:
library('tidyverse')
vel <- read.csv('../data/water/vel.dat', header = TRUE, sep = ',', 
stringsAsFactors = FALSE) vel$year <- as.integer(vel$year) vel$month <- 
as.integer(vel$month) vel$day <- as.integer(vel$day) vel$hour <- 
as.integer(vel$hour) vel$min <- as.integer(vel$min) vel$fps <- 
as.double(vel$fps, length = 6)

# use dplyr to filter() by year, month, day; summarize() to get monthly # means 
vel_by_month = vel %>%
     group_by(year, month) %>%
     summarize(flow = mean(fps, na.rm = TRUE))

R's display after running the script:
> source('vel.R')
`summarise()` has grouped output by 'year'. You can override using the 
`.groups` argument.
Warning messages:
1: In eval(ei, envir) : NAs introduced by coercion
2: In eval(ei, envir) : NAs introduced by coercion
3: In eval(ei, envir) : NAs introduced by coercion

The dataframe created by the read.csv() command:
> head(vel)
   year month day hour min  fps
1 2016     3   3   12   0 1.74
2 2016     3   3   12  10 1.75
3 2016     3   3   12  20 1.76
4 2016     3   3   12  30 1.81
5 2016     3   3   12  40 1.79
6 2016     3   3   12  50 1.75

and the resulting grouping:
> vel_by_month
# A tibble: 67 × 3
# Groups:   year [8]
     year month   flow
    <int> <int>  <dbl>
  1     0    NA NaN
  2  2016     3   2.40
  3  2016     4   3.00
  4  2016     5   2.86
  5  2016     6   2.51
  6  2016     7   2.18
  7  2016     8   1.89
  8  2016     9   1.38
  9  2016    10   1.73
10  2016    11   2.01
# … with 57 more rows

I cannot find why line 1 is there. Other data sets don't produce this result.

TIA,

Rich

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to