On Tue, 9 Apr 2024, Ivan Krylov wrote:
That's fine, R will run straight from the build directory. It has to do
so in order to compile the vignettes.
Ivan,
That's good to know. Thanks.
But let's skip this step. Here's reshape.tex from R-4.3.3:
https://0x0.st/XidU.tex/reshape.tex
(Feel free
On Tue, 9 Apr 2024, Ivan Krylov wrote:
At this point in the build, R already exists, is quite operable and
even has all the recommended packages installed. The build system then
uses this freshly compiled R to run Sweave on the vignettes. Let me
break the build in a similar manner and see what h
On Mon, 8 Apr 2024, Ivan Krylov wrote:
A Web search suggests that texi2dvi may output this message by mistake
when the TeX installation is subject to a different problem:
https://web.archive.org/web/20191006123002/https://lists.gnu.org/r/bug-texinfo/2016-10/msg00036.html
Ivan,
That thread is
On Mon, 8 Apr 2024, Ivan Krylov wrote:
Questions about building R do get asked here and R-devel. Since you're
compiling a released version of R and we don't have an R-SIG-Slackware
mailing list, R-help sounds like the right place.
Ivan,
Okay:
What are the last lines of the build log, contai
I've been building R versions for years with no issues. Now I'm trying to
build R-4.3.3 on Slackware64-15.0 (fully patched) with TeXLive2024 (fully
patched) installed. The error occurs building a vignette.
Is this mail list the appropriate place to ask for help or should I post the
request on sta
On Tue, 30 Jan 2024, Leo Mada wrote:
It depends how the data is generated.
Although I am not an expert in ecology, I can explain it based on a
biomedical example.
Leo,
My data are environmental, observational concentrations of water
constituents. It's not only how data are generated, what que
On Mon, 22 Jan 2024, Rich Shepard wrote:
As an aquatic ecologist I see regulators apply the geometric mean to
geochemical concentrations rather than using the arithmetic mean. I want to
know whether the geometric mean of a set of chemical concentrations (e.g.,
in mg/L) is an appropriate
On Mon, 22 Jan 2024, Martin Maechler wrote:
I think it is a good question, not really only about geo-chemistry, but
about statistics in applied sciences (and engineering for that matter).
John W Tukey (and several other of the grands of the time) had the log
transform among the "First aid tra
On Mon, 22 Jan 2024, Ben Bolker wrote:
I think https://stats.stackexchange.com would be best: r-sig-ecology is
pretty quiet these days
Okay, Ben.
Thanks,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.
On Mon, 22 Jan 2024, Bert Gunter wrote:
better posted on r-sig-ecology? -- or maybe even stack exchange?
Bert,
Okay.
Regards,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
P
A statistical question, not specific to R.
I'm asking for a pointer for a source of definitive descriptions of what
types of data are best summarized by the arithmetic, geometric, and harmonic
means.
As an aquatic ecologist I see regulators apply the geometric mean to
geochemical concentrations
On Wed, 1 Apr 2015, Prof Brian Ripley wrote: > I would start by trying
LANGUAGE=en , e.g. More specifically, you can use en_US or en_GB. Rich [...]
͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
On Fri, 24 Dec 2021, Jeff Newmiller wrote:
The qsave/qread functions from the qs package are functionally
interchangeable with saveRDS/readRDS, but faster and create smaller files.
A simple
if ( file.exists( "obj1.qs" ) ) {
obj1 <- qread( "onj1.qs" )
} else {
obj1 <- compute_obj1()
q
On Fri, 24 Dec 2021, Rui Barradas wrote:
Can section Adoption of this Library of Congress link
https://www.loc.gov/preservation/digital/formats/fdd/fdd000470.shtml
help? I'm really not sure.
Rui,
After reading that interesting page I don't see how it helps me. I think
that re-running all my s
On Fri, 24 Dec 2021, Adrian Dușa wrote:
Package admisc has a function called obj.rda(), which returns the names of
the objects from an .Rdata file. Not sure how it handles corrupt .Rdata
files, but should generally give an idea about what's inside.
Adrian,
Thank you. I know what dataframes an
On Fri, 24 Dec 2021, Rasmus Liland wrote:
If you want to look at Rdata-files in a quick way in the
terminal, use this little gem in your .zshrc.local:
readrdata() { Rscript -e "options(width=$COLUMNS); load('$1'); sapply(ls(), get,
simplify=F)" | less }
Rasmus,
I use bash, not zsh. And runn
On Thu, 23 Dec 2021, Bill Dunlap wrote:
Three things you might try using R (and show the results in this email
thread):
Bill,
* load(verbose=TRUE, ".RData") # see how far it gets before stopping
load(verbose=TRUE, ".RData")
Loading objects:
all_turb_plot
disc_all
Error in load(verbose =
On Thu, 23 Dec 2021, Jeff Newmiller wrote:
This practice (saving and resuming from Rdata files) often ends badly this
way. Objects that are "shared" in memory get saved as separate data and
may not "fit" when re-loaded. This is why re-running from scratch should
always be part of your workflow.
Each time I finish with a session I save the image. Today the saved image
did not load and manually running 'load('.RData') fails:
load('.RData')
Error in load(".RData") :
ReadItem: unknown type 0, perhaps written by later version of R
This has not happened before.
Installed is R-4.1.2-x8
On Wed, 15 Dec 2021, Avi Gross wrote:
I still do not see what you want to do, sorry.
Avi,
Backing up to my original post on this thread I've realized that no one
addressed my main question: do variable measurement intervals affect
analyses of the data. And, if so, how and how to compensate fo
On Thu, 16 Dec 2021, Chris Evans wrote:
What you said earlier was:
For me the next step, in tidyverse pseudocode, might be something like:
tibData %>%
arrange(nbr, datetime) %>% # just in case things are not ordered nicely
group_by(site_nbr) %>% # as you want to get changes within site I
On Thu, 16 Dec 2021, Jim Lemon wrote:
From what you sent, it seems like you want to find where the change in
_measurement interval_ occurred. That looks to me as though it is the
first datetime in each row. In the first row, there is a week gap between
the ten and fifteen minute intervals. This
On Wed, 15 Dec 2021, jim holtman wrote:
At least show a sample of the data and then what you would like as output.
Jim,
There are 813,694 rows of data. As I wrote,
A 33-year set of river discharge data at one gauge location has recording
intervals of 5, 10, and 30 minutes over the period of
A 33-year set of river discharge data at one gauge location has recording
intervals of 5, 10, and 30 minutes over the period of record.
The data.frame/tibble has columns for year, month, day, hour, minute, and
datetime.
Would difftime() allow me to find the dates when the changes occurred?
TIA,
On Fri, 3 Dec 2021, Rich Shepard wrote:
they apparently do. For example, 99.9000 cubic feet per second is reached
99,900
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https
On Fri, 3 Dec 2021, Bert Gunter wrote:
Perhaps you meant to point this out, but the cfs[which.max(cfs)] and
cfs == ... are not the same:
x <- rep(1:2,3)
x
[1] 1 2 1 2 1 2
x[which.max(x)]
[1] 2
x[x==max(x)]
[1] 2 2 2
So maybe your point is: which does the OP want (in case there are
repeat
On Fri, 3 Dec 2021, Rui Barradas wrote:
which.max(pdx_disc$cfs)
[1] 8054
This is the *index* for which cfs is the first maximum, not the maximum
value itself.
Rui,
Mea culpa! I completely forgot this.
Therefore, you probably want any of
filter(pdx_disc, cfs == cfs[8054])
filter(pdx_disc,
On Fri, 3 Dec 2021, Rich Shepard wrote:
I find solutions when the data_frame is grouped, but none when it's not.
Thanks, Bert. ?which.max confirmed that's all I need to find the maximum
value.
Now I need to read more than ?filter to learn why I'm not getting the
relevant row w
On Fri, 3 Dec 2021, Jeff Newmiller wrote:
cfs is not a function. Don't put parentheses next to it. Use square
brackets for indexing.
Jeff,
Thanks.
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailm
I find solutions when the data_frame is grouped, but none when it's not.
The data:
# A tibble: 813,693 × 9
site_nbr year mon dayhr min tz cfs sampdt
1 14211720 198810 1 010 PDT 16800 1988-10-01 00:10:00
2 14211720 198810 1 02
On Wed, 1 Dec 2021, ani jaya wrote:
one of my solution :
text(x,y,"\u00B0C", cex=1.1, font=2)
it will produce "°C"
Ani,
That's what I did.
Thank you,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/m
On Tue, 30 Nov 2021, David Winsemius wrote:
There's nothing special about following a digit. You can have it follow
anything. Since you were going to need to quote the parentheses anywa,
then have it superscripted above the level of the paren:
plot(1,1, ylab = expression(Temperature~"("^degree*C
On Tue, 30 Nov 2021, David Winsemius wrote:
Really? What was wrong with this?
plot(1, 1, xlab=expression(32^degree) ) # the example given on ?plotmath
David,
Absolutely nothing. When there's no specific degree value in the label
because the axis represents a range of values there's no digit
On Tue, 30 Nov 2021, Bill Dunlap wrote:
The following makes degree signs appropriately, as shown in ?plotmath:
plot(68, 20, xlab=expression(degree*F), ylab=expression(degree*C))
If you want the word "degree" spelled out, put it in quotes.
Bill,
I missed that last point; thought it was alwa
On Tue, 30 Nov 2021, Rich Shepard wrote:
Thanks, Andrew. I will.
plotmath didn't have the solution; the use of the LaTeX ^ for a superscript
had a character or number preceeding it. Using 'degree' prints that string
on the axis.
What does work is using the unicode for the d
On Tue, 30 Nov 2021, Andrew Simmons wrote:
Excuse my brevity, but take a look at ?plotmath
It has tons of tips for making pretty labels
Thanks, Andrew. I will.
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.e
I want to present the temperature on the Y-axis label as 'Water Temperature
(oC)' with the degree symbol as a superscript.
My web search found a couple of methods; one put the entire example string
in the axis label, the other is close, but still incorrect.
Source example:
#define expression wit
On Tue, 30 Nov 2021, Sarah Goslee wrote:
Andrew is right - it's a typo.
Sarah,
Thanks,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://w
On Tue, 30 Nov 2021, Andrew Simmons wrote:
It seems like the headers are misnamed, that should be a comma between
sampdate and param, not a period
Andrew,
I completely missed seeing this, probably because I expected a comma and
didn't look closely enough.
Thanks very much for catching this,
A short data file:
site_nbr,sampdate.param,quant,unit
31731,2005-07-12,temp,19.7,oC
31731,2007-03-28,temp,9,oC
31731,2007-06-27,temp,18.3,oC
31731,2007-09-26,temp,15.8,oC
31731,2008-01-17,temp,5.4,oC
31731,2008-03-27,temp,7.4,oC
31731,2010-04-05,temp,8.1,oC
31731,2010-07-26,temp,20.5,oC
31731,2010
On Wed, 24 Nov 2021, Bill Dunlap wrote:
Did the 3 warnings come from three separate calls to read_csv? If so, can
you identify which files caused the warnings? E.g., change the likes of
lapply(files, function(file) read_csv(file, ...)) to
options(warn=1) # report warnings immediately
lap
On Wed, 24 Nov 2021, Ivan Krylov wrote:
This typically happens when you leave a trailing comma at the end of a
list() call:
Ivan,
Yes. I figured that out yesterday but didn't change the draft message. There
no longer are any extraneous commas in the script.
Thanks,
Rich
___
Applying read_csv() on certain data files produce this error:
Error in list(site_nbr = col_character(), sampdate = col_date(), param =
col_character(), :
argument 6 is empty
In addition: Warning messages:
1: The following named parsers don't match the column names: param, unit
2: The followin
On Fri, 12 Nov 2021, Avi Gross wrote:
Type colors()
Avi,
That's really helpful. Names are more easily grokked than are hex numbers.
Thanks,
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/lis
I have an R color chart downloaded from the Web March 4, 2015. Using it to
set ggplot2 colors R responds with 'Error: Unknown colour name: FF3030' for
a number of these colors.
There are several different color charts for R when I look for a new one but
I cannot tell which one has all colors reco
On Fri, 12 Nov 2021, Chris Evans wrote:
I think it's your xlab. Should be:
Chris,
Ah, I should know better, but I didn't relate the error message to that
line.
Thank you.
Rich
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
I'm writing a script to plot data distributions. It worked in a basic form
and I'm now adding features and tweaking the presentation. When I sourced
the file this error appeared:
Error in p1 <- ggplot(data = pdx_disc, aes(x = NULL, y = cfs)) + geom_boxplot(color =
"#8B", (from all_disc_boxp
On Thu, 11 Nov 2021, John Dougherty wrote:
You should probably glance at the R Graphics Cookbook. That was my gateway
to ggplot. I believe ggplot is a part of the tidyverse so there should
good information.
John,
I have Hadley's ggplot2 book and keep referring to it but haven't used it
before
On Thu, 11 Nov 2021, Avi Gross via R-help wrote:
This is not a place designed for using packages but since this discussion
persists, ...
Avi,
I'll find a cowplot help site.
# Create and save two ggplots, or more in your case:
p1 <- ggplot(data=df1, aes(x=NULL, y=cfs)) +
geom_boxplot(color
On Thu, 11 Nov 2021, Avi Gross wrote:
Your message was just to me so the reply is also just to you.
Avi,
Oops! Mea cupla.
Yes, large data sets can be handled if your machine has the memory and one
big one takes up the same amount as four smaller ones if combined.
This desktop has 32G RAM
On Thu, 11 Nov 2021, Bert Gunter wrote:
These days, 3e6 rows x 3 columns is small, unless large objects are in
each cell.
I think R would handle this with ease.
Thanks, Bert. See my last post showing data set structure and suggested
collection for use by grouping.
Regards,
Rich
___
On Thu, 11 Nov 2021, Avi Gross via R-help wrote:
Say I have a data.frame with columns called PLACE and MEASURE others. The
one I call PLACE would be a factor containing the locations you are
measuring at. I mean it would be character strings of your N places but
the factors would be made in the
On Thu, 11 Nov 2021, Bert Gunter wrote:
You can always create a graphics layout and then plot different
ggplot objects in the separate regions of the layout. See ?grid.layout
(since ggplots are grobs) and ?plot.ggplot . This also **may** be
useful by showing examples using grid.arrange()
htt
On Thu, 11 Nov 2021, Avi Gross wrote:
Boxplots like many other things in ggplot can be grouped in various ways.
I often do something like this:
Avi,
I've designed and used multiple boxplots in many projects. They might show
geochemical concentrations at two locations or in two (or three) sepa
On Thu, 11 Nov 2021, Jeff Newmiller wrote:
I strongly recommend that you change your way of thinking when it comes to
ggplot: if your data are not yet in one data frame then your data are not
yet ready for plotting.
It is possible to specify separate data frames for different layers of the
pl
On Wed, 10 Nov 2021, Rich Shepard wrote:
I have the code to create ggplot2 boxplots using two attributes (e.g.,
chemical concentration and month) from the same tibble. Is there an
example from which I can learn how to make boxplots from different
tibbles/dataframes (e.g., chemical
On Thu, 11 Nov 2021, Bill Dunlap wrote:
I googled for "ggplot2 boxplots by group" and the first hit was
https://www.r-graph-gallery.com/265-grouped-boxplot-with-ggplot2.html
which displays lots of variants along with the code to produce them. It
has links to ungrouped boxplots and shows how vi
On Thu, 11 Nov 2021, Bill Dunlap wrote:
I googled for "ggplot2 boxplots by group" and the first hit was
https://www.r-graph-gallery.com/265-grouped-boxplot-with-ggplot2.html
which displays lots of variants along with the code to produce them. It
has links to ungrouped boxplots and shows how vi
On Wed, 10 Nov 2021, Avi Gross via R-help wrote:
I think many here may not quite have enough info to help you.
Avi,
Actually, you've reflected my thinking.
But the subject of multiple plots has come up. There are a slew of ways,
especially in the ggplot paradigm, to make multiple smaller pl
On Wed, 10 Nov 2021, Bert Gunter wrote:
As always, online search (on "ggplot2 help") seemed to bring up useful
resources. I suggest you look here (suggested tutorials and resources are
farther down the page): https://ggplot2.tidyverse.org/
Bert,
My web search was for multiple boxplots and I d
On Wed, 10 Nov 2021, Richard M. Heiberger wrote:
I don't understand your question. It looks like the example in
?lattice::panel.bwplot does exactly what you want (modulo using ggplot
instead of lattice). Therefore it looks like creating a single column of y
from the y in each data.frame, and als
On Wed, 10 Nov 2021, Rich Shepard wrote:
I have the code to create ggplot2 boxplots using two attributes (e.g.,
chemical concentration and month) from the same tibble. Is there an
example from which I can learn how to make boxplots from different
tibbles/dataframes (e.g., chemical
I have the code to create ggplot2 boxplots using two attributes (e.g.,
chemical concentration and month) from the same tibble. Is there an example
from which I can learn how to make boxplots from different
tibbles/dataframes (e.g., chemical concentrations and monitoring location)?
TIA,
Rich
___
I want to thank all of you for your help the past few days. I now have all
data sets imported, datetime columns added, and distribution stats
calculated for each. No errors.
My searches on the web for what to do when problems() produces no results,
and the few comments on my stackexchange post, w
On Thu, 4 Nov 2021, Rui Barradas wrote:
Maybe
which(is.na(pdx_stage$ft))
Have you tried na.rm = TRUE?
mean(pdx_stage$ft, na.rm = TRUE)
Rui,
I just scrolled through the data file.
Yes, there are severeal NAs when the equipment was down and I hadn't put
na.rm = TRUE in the read_csv() import co
I'm not seeing what's different about this tibble so that mean() returns NA
on a column of doubles:
head(pdx_stage)
# A tibble: 6 × 8
site_nbr year mon dayhr min tz ft
1 14211720 200710 1 1 0 PDT3.21
2 14211720 200710 1 130 PD
On Thu, 4 Nov 2021, Micha Silver wrote:
Why are you importing the last "ft" column as an integer when it's clearly
decimal data?
Micha,
Probably because I was still thinking of the discharge data which are
integers. That explains all the issues.
Mea culpa!
Many thanks for seeing what I kept
On Thu, 4 Nov 2021, Ben Tupper wrote:
The help for problems() shows that the expected argument has a default
value of .Last.value. If you don't provide the input argument, it just
uses the last thing your R session evaluated. That's great if you run
problems() right after your issues arises. But
On Wed, 3 Nov 2021, Rui Barradas wrote:
You do not assign the pipe output, so put the print statement as the last
instruction of the pipe. The following works.
# file: rhelp.R
library(dplyr)
mtcars %>%
select(mpg, cyl, disp, hp, am) %>%
mutate(
sampdt = c("automatic", "manual")[am + 1L]
On Wed, 3 Nov 2021, Ivan Krylov wrote:
instead. When you source() a script, auto-printing is not performed. This
is explained in the first paragraph of ?source, but not ?sink. If you want
to source() scripts and rely on their output (including sink()), you'll
need to print() results explicitly.
On Wed, 3 Nov 2021, Bert Gunter wrote:
More to the point, the tidyverse galaxy tries to largely replace R's
standard functionality and has its own help forum. So I think you should
post there, rather than here, for questions about it:
https://www.tidyverse.org/help/
Bert,
Thank you very much.
When I source the import_data.R script now I get errors that tell me to look
at problems(). I enter that function name but there's no return.
Reading ?problems I learned that stop_for_problems(x) should stop the
process when a problem occurs, so I added that function to each data file;
for exampl
From krylov.r...@gmail.com Tue Nov 2 14:22:05 2021
instead. When you source() a script, auto-printing is not performed. This
is explained in the first paragraph of ?source, but not ?sink. If you want
to source() scripts and rely on their output (including sink()), you'll
need to print() result
On Tue, 2 Nov 2021, Bert Gunter wrote:
What do you think these 2 lines are doing?
cat ('corvalis discharge summary\n')
print(cat)
Bert,
If I used them in linux cat would display the file (as do more and less) and
print() would be replaced with lpr
Please consult ?cat . You might also spend
On Tue, 2 Nov 2021, Andrew Simmons wrote:
You probably want to use cat and print for these lines. These things won't
print when not run at the top level, so if you want them to print, you must
specify that.
Andrew,
I modified the file to this:
sink('data-summaries.txt')
cat ('corvalis dischar
I've read ?sink and several web pages about it but it's not working properly
when I have the commands in a script and source() them.
The file:
library(tidyverse)
library(lubridate)
sink('data-summaries.txt')
'corvalis discharge summary\n'
summary(cor_disc)
sd(cor_disc$cfs)
'-\n'
On Tue, 2 Nov 2021, Ivan Krylov wrote:
That's because mutate() doesn't, well, mutate its argument. It _returns_
its changes, but it doesn't save them in the original variable. It's your
responsibility to assign the result somewhere:
Ivan,
I realized this after thinking more about the issue.
On Mon, 1 Nov 2021, jim holtman wrote:
drop the select, or put tz in the select
Jim,
Thinking more about the process after logging out for the evening it
occurred to me that I need to select all columns to retain them in the
tibble. I just tried that and, sure enough, that did the job.
Thank
On Mon, 1 Nov 2021, CALUM POLWART wrote:
Mutate. Probably.
Calum,
I thought that I had it working, but I'm still missing a piece.
For example,
cor_disc %>%
+ select(year, mon, day, hr, min) %>%
+ mutate(
+ sampdt = make_datetime(year, mon, day, hr, min)
+ )
# A tibble: 415,263 × 6
yea
On Mon, 1 Nov 2021, CALUM POLWART wrote:
Also - what is the RAW CSV like?
"12345678",2019,"10"
Calum,
No parentheses. Example:
14171600,2009,10,23,05,15,PDT,8710
14171600,2009,10,23,05,30,PDT,8710
Which causes another question. If I work within the tidyverse should I
mutate the year, month,
On Mon, 1 Nov 2021, CALUM POLWART wrote:
You specified 7 column types for 8 columns...
Thanks, Calum. I didn't see that I mis-counted when adding types.
But, that doesn't change the tibble types for year and disc from doubles to
character and int, respectively.
Regards,
Rich
__
On Mon, 1 Nov 2021, Kevin Thorpe wrote:
Is there a leading space on those variables for that row?
Kevin,
No:
site_nbr,year,mon,day,hr,min,tz,disc
14171600,2009,10,23,00,00,PDT,8750
...
Thanks,
Rich
__
R-help@r-project.org mailing list -- To UNSUB
On Mon, 1 Nov 2021, Jeff Newmiller wrote:
Sorry... untested code... use which... not where.
Jeff,
That problem's resolved; problems() found the lines.
Question:
cor_disc
# A tibble: 415,263 × 8
site_nbr year mon day hrmin tz disc
1 14171600 2009 1023
On Mon, 1 Nov 2021, Bill Dunlap wrote:
Use the col_type argument to specify your column types. [Why would you
expect '2009' to be read as a string instead of a number?]. It looks like
an initial zero causes an otherwise numeric looking entry to be considered
a string (handy for zip codes in the
On Mon, 1 Nov 2021, Jeff Newmiller wrote:
More explicitly... look at rows past the first row. If your csv has 300
rows and column 1 has something non-numeric in row 299 then the whole
column gets imported as character data. Try
cor_disc[[ 1 ]] |> as.numeric() |> is.na() |> where()
to find suspec
On Mon, 1 Nov 2021, Kevin Thorpe wrote:
I do not have a specific answer to your particular problem. All I can say
is when a CSV import doesn’t work, it can mean there is something in the
CSV file that is unexpected. When read_csv() fails, I will try read.csv()
to compare the results.
Kevin,
I
On Mon, 1 Nov 2021, Kevin Thorpe wrote:
I do not have a specific answer to your particular problem. All I can say
is when a CSV import doesn’t work, it can mean there is something in the
CSV file that is unexpected. When read_csv() fails, I will try read.csv()
to compare the results.
Kevin,
T
The data file, cor-disc.csv begins with:
site_nbr,year,mon,day,hr,min,tz,disc
14171600,2009,10,23,00,00,PDT,8750
The first 7 columns are character strings; the 8th column is an integer.
After loading library(tidyverse) I ran read_csv() with this result:
cor_disc <- read_csv("../data/cor-disc.cs
On Tue, 14 Sep 2021, Bert Gunter wrote:
**Don't do this.*** You will make errors. Use fit-for-purpose tools.
That's what R is for. Also, be careful **how** you "download", as that
already may bake in problems.
Bert,
Haven't had downloading errors saving displayed files.
The problem with the
On Tue, 14 Sep 2021, Bert Gunter wrote:
Input problems of this sort are often caused by stray or extra characters
(commas, dashes, etc.) in the input files, which then can trigger
automatic conversion to character. Excel files are somewhat notorious for
this.
Bert,
Large volume of missing dat
On Tue, 14 Sep 2021, Eric Berger wrote:
My suggestion was not 'to make a difference'. It was to determine whether
the NAs or NaNs appear before the dplyr commands. You confirmed that they
do. There are 2321 NAs in vel. Bert suggested some ways that an NA might
appear.
Eric,
Yes, you're all co
On Tue, 14 Sep 2021, Bert Gunter wrote:
Input problems of this sort are often caused by stray or extra characters
(commas, dashes, etc.) in the input files, which then can trigger
automatic conversion to character. Excel files are somewhat notorious for
this.
Bert,
Yes, I'm going to closely r
On Tue, 14 Sep 2021, Bert Gunter wrote:
Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your
problems.
Bert,
Are all columns but the fps factors?
R
On Tue, 14 Sep 2021, Bert Gunter wrote:
Remove all your as.integer() and as.double() coercions. They are
unnecessary (unless you are preparing input for C code; also, all R
non-integers are double precision) and may be the source of your problems.
Bert,
When I remove coercions the script prod
On Tue, 14 Sep 2021, Eric Berger wrote:
Before you create vel_by_month you can check vel for NAs and NaNs by
sum(is.na(vel))
sum(unlist(lapply(vel,is.nan)))
Eric,
There should not be any missing values in the data file. Regardless, I added
those lines to the script and it made no difference.
The data file begins this way:
year,month,day,hour,min,fps
2016,03,03,12,00,1.74
2016,03,03,12,10,1.75
2016,03,03,12,20,1.76
2016,03,03,12,30,1.81
2016,03,03,12,40,1.79
2016,03,03,12,50,1.75
2016,03,03,13,00,1.78
2016,03,03,13,10,1.81
The script to process it:
library('tidyverse')
vel <- read.csv
On Mon, 13 Sep 2021, Bert Gunter wrote:
If you are interested in extracting seasonal patterns from time series,
you might wish to check out ?stl (in the stats package). Of course, there
are all sorts of ways in many packages to fit seasonality in time series
that are more sophisticated, but prob
On Mon, 13 Sep 2021, Avi Gross via R-help wrote:
Just FYI, Rich, the way the idiom with pipeline works does allow but not
require the method you used:
...
But equally valid are forms that assign the result at the end:
Avi,
I'll read more about tidyverse and summarize() in R and not just i
On Mon, 13 Sep 2021, Avi Gross via R-help wrote:
As Eric has pointed out, perhaps Rich is not thinking pipelined. Summarize()
takes a first argument as:
summarise(.data=whatever, ...)
But in a pipeline, you OMIT the first argument and let the pipeline supply an
argument silently.
Av
On Tue, 14 Sep 2021, Eric Berger wrote:
This code is not correct:
disc_by_month %>%
group_by(year, month) %>%
summarize(disc_by_month, vol = mean(cfs, na.rm = TRUE))
It should be:
disc %>% group_by(year,month) %>% summarize(vol=mean(cfs,na.rm=TRUE)
Eric/Avi:
That makes no difference:
1 - 100 of 745 matches
Mail list logo