[R] Subtracting Data Frame With a Different Number of Rows
I have two small data frames of baseball data. The first one is the mean number of runs that will score in each half inning for the 2018 Arizona Diamondbacks. The second data frame is the same information but for only one player. As you will see the individual player did not come up to bat any time during the season: with the bases loaded and no outs runners on first and third with one out Overall RunnerCodeOuts MeanRuns 1 Bases Empty 0 0.5137615 2 Runner:1st0 0.8967391 3 Runner:2nd 0 1.3018868 4 Runners:1st & 2nd0 1.6551724 5 Runner:3rd0 1.9545455 6 Runners:1st & 3rd 0 2.0571429 7 Runners:2nd & 3rd0 2.1578947 8 Bases Loaded0 3.2173913 9 Bases Empty 1 0.3963801 10 Runner:1st 1 0.6952596 11 Runner:2nd 1 0.9580838 12 Runners:1st & 2nd 1 1.4397163 13 Runner:3rd 1 1.5352113 14 Runners:1st & 3rd 11.5882353 15 Runners:2nd & 3rd 11.9215686 16 Bases Loaded 11.9193548 17 Bases Empty20.4191011 18 Runner:1st 20.5531915 19 Runner:2nd 20.8777293 20 Runners:1st & 2nd 2 0.9553073 21 Runner:3rd 2 1.2783505 22 Runners:1st & 3rd 2 1.5851064 23 Runners:2nd & 3rd 2 1.2794118 24 Bases Loaded 2 1.388235 Individual Player RunnerCode Outs MeanRuns 1 Bases Empty 0 0.4262295 2 Runner:1st0 1.320 3 Runner:2nd 0 1.2857143 4 Runners:1st & 2nd 0 0.5714286 5 Runner:3rd 0 2.000 6 Runners:1st & 3rd0 3.500 7 Runners:2nd & 3rd 0 1.000 8 Bases Empty 1 0.5238095 9 Runner:1st1 0.6578947 10 Runner:2nd 1 0.375 11 Runners:1st & 2nd 1 1.4285714 12 Runner:3rd 1 1.4285714 13 Runners:2nd & 3rd 1 0.667 14 Bases Loaded 1 3.000 15 Bases Empty 2 0.3469388 16 Runner:1st 2 0.1363636 17 Runner:2nd 2 0.7142857 18 Runners:1st & 2nd 2 1.667 19 Runner:3rd 2 1.250 20 Runners:1st & 3rd 22.1428571 21 Runners:2nd & 3rd 21.500 22 Bases Loaded 22.200 RunnersCode is a factor Outs are integers MeanRuns is numerical data I would like to subtract the second from the first as a way to evaluate the players ability to produce runs. As part of this analysis I I would like to input the mean number of runs from the overall data frame into the two missing cells for the individual player:Bases Loaded no outs and 1st and 3rd one out. Can anyone give me some advise? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tidyverse Question
Can someone out there run the following code from the book Analyzing Baseball Data with R – Chapter 7 page 164? library(tidyverse) db <- src_sqlite(“data/pitchrx.sqlite”,create=TRUE) Over the past two weeks this code has run correctly twice but I have gotten the following error dozens of times: Error: Could not connect to database: unable to open database file I’m trying to figure out if the problem is with my computer or if the tidyverse package has been revised since this book was written. I got the same error when I loaded R onto my wife’s Mac. The file pitchrx.sqlite loaded into my directory C:/Users/Owner/Documents. The data file db contains four xml files used later in the analysis. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Analyzing Baseball Data With R
Can’t get past first step of Chapter 7 page 164. Opened a new RStudio window. Loaded tidyverse and keyed in library(tidyverse) which of course includes dplyr. The working directory is: C:/Users/Owner/Documents. Then keyed in: db <- src_sqlite(“data/pitchrx.sqlite”,create=TRUE) And got the following error: Error: Could not connect to database: unable to open database file Googled everything I could think of to find the sqlite function and the pitchrx.sqlite empty data base. Can someone give me some direction? I wondering if I have configured RStudio incorrectly. Why doesn’t my by RStudio point to the correct data file? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing Data From packages
I am continuing to have problems downloading data as prescribed in books about R such as “Analyzing Baseball Data with R”. In chapter 3 (page 67) the instructions to download baseball Hall of Fame data from the package tidyverse are: library(tidyverse) -- Attaching packages --- tidyverse 1.3.0 -- v ggplot2 3.2.1 v purrr 0.3.3 v tibble 2.1.3 v dplyr 0.8.3 v tidyr 1.0.0 v stringr 1.4.0 v readr 1.3.1 v forcats 0.4.0 -- Conflicts -- tidyverse_conflicts() -- x dplyr::filter() masks stats::filter() x dplyr::lag() masks stats::lag() Warning messages: 1: package ‘tidyverse’ was built under R version 3.6.2 2: package ‘purrr’ was built under R version 3.6.2 The package seems to load correctly but when I try to call up the data I get an error message. *** hof <- read_csv("data/hofbatting.csv") Error: 'data/hofbatting.csv' does not exist in current working directory ('C:/Users/Owner/Documents'). *** I have no idea where the data is hiding. Can someone give me some directions. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data Carpentry - Creating a New SQLite Database
Working my way through a tutorial named Data Carpentry (https://datacarpentry.org/R-ecology-lesson/). for the most part it is excellent but I’m stuck on the very last section (https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html). First, below are the packages I have loaded: [1] "forcats" "stringr" "purrr" "readr" "tidyr" "tibble" "ggplot2" "tidyverse" "dbplyr""RMySQL""DBI" [12] "dplyr" "RSQLite" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" > Second, Second, is the text of the last section of the last chapter titled “Creating a New SQLite Database”. Second, below is the text from the tutorial. The black type is from the tutorial. The green and blue is the suggested R code. My comments are in red. Creating a new SQLite database So far, we have used a previously prepared SQLite database. But we can also use R to create a new database, e.g. from existing csv files. Let’s recreate the mammals database that we’ve been working with, in R. First let’s download and read in the csv files. We’ll import tidyverse to gain access to the read_csv() function. download.file("https://ndownloader.figshare.com/files/3299483;, "data_raw/species.csv") download.file("https://ndownloader.figshare.com/files/10717177;, "data_raw/surveys.csv") download.file("https://ndownloader.figshare.com/files/3299474;, "data_raw/plots.csv") library(tidyverse) species <- read_csv("data_raw/species.csv")No problem here. I’m pulling three databases from the Web and saving them to a folder on my hard drive. (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots <- read_csv("data_raw/plots.csv")Again no problem. I’m just creating an R data files. But here is where I loose it. I’m creating something named my_db_file from another file named portal-database-output with an sqlite extension and then creating my_db from the My_db_file. Not sure where the sqlite extension file came from. Creating a new SQLite database with dplyr is easy. You can re-use the same command we used above to open an existing .sqlite file. The create = TRUE argument instructs R to create a new, empty database instead. Caution: When create = TRUE is added, any existing database at the same location is overwritten without warning. my_db_file <- "data/portal-database-output.sqlite" my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is empty, it doesn’t contain any tables: my_db#> src: sqlite 3.29.0 [data/portal-database-output.sqlite] #> tbls:To add tables, we copy the existing data.frames into the database one by one: copy_to(my_db, surveys) copy_to(my_db, plots) my_dbI can follow the directions to fill in my_db but I have no idea how to access the tables. The text from the tutorial below says to check the location of our database. Huh! Can someone give me some direction. Thanks. If you check the location of our database you’ll see that data is automatically being written to disk. R and dplyr not only provide easy ways to query existing databases, they also allows you to easily create your own databases from flat files! Here is where I loose it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] If Loop I Think
Row Outs RunnerFirst RunnerSecond RunnerThird R1 R2 R3 1 0 2 1 3 1 4 1 arenn001 5 2 arenn001 6 0 7 0 perad001 8 0 polla001 perad001 9 0 goldp001 polla001 perad001 10 0 lambj001 goldp001 11 1 lambj001 goldp001 12 2 lambj001 13 0 14 1 With the above data, Arizona Diamondbacks baseball, I’m trying to put zeros into the R1 column is the RunnerFirst column is blank and a one if the column has a coded entry such as rows 4,5,7,8,& 9. Similarly I want zeros in R2 and R3 if RunnerSecond and RunnerThird respectively are blank and ones if there is an entry. I’ve tried everything I know how to do such as “If Loops”, “If-Then loops”, “apply”, “sapply”, etc. I wrote function below and it ran without errors but I have no idea what to do with it to accomplish my goal: R1 <- function(x) { if (ari18.test3$RunnerFirst == " "){ ari18.test3$R1 <- 0 return(R1) }else{ R1 <- ari18.test3$R1 <- 1 return(R1) } } The name of the data frame is ari18.test3 On a more philosophical note, data handling in R seems to be made up of thousands of details with no over-riding principles. I’ve read two books on R and a number of tutorial and watched several videos but I don’t seem to be making any progress. Can anyone suggest videos, or tutorials, or books that might help? Database stuff has never been my strong point but I’m determined to learn. Thanks, Philip Heinrich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Another Real Basic Question
In the Source window of RStudio (upper left) I save my code (File/Save) but can not reload it. There is a file labeled (RECode.R) but neither File/Open file or File/Recent Files gets me anywhere. Any ideas what I’m doing wrong. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mapple
With the snippet of data below I’m trying to do an if/then type of thing: row 1 – if all five variables equal 0 then code equals 1; row 3 – if v1 = 1 and v2 = 1 then code = 5; row 7 – if v1 = 0 and v2 = 1 and v3 = 2 then code = 10 There are 24 codes in the complete database. v1 v2 v3 v4 v5 code 1 0 0 0 0 01 2 1 4 0 0 01 3 1 1 0 0 01 4 1 0 1 0 01 5 2 0 1 0 01 6 0 1 0 0 01 7 0 1 2 0 01 8 0 1 2 3 01 9 0 2 3 4 41 10 0 0 0 2 31 I understand that the mapply function can do things like this but I have been reading documentation and poking around with Google but am getting nowhere. Any advise whould be greatly appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Real Basic Question
Just when I think I’m starting to get the hang of R I run into something that sends me back to Go without collecting $200. The working directory seems to be correct when I load an .rda file but it is not there and it is not in the Global Environment in the upper right hand window in RStudio. getwd() [1] "C:/Users/Owner/Documents/Baseball/RetroSheetDocumentation" > load("~/Baseball/RetroSheetDocumentation/ari18.test2.rda") > ari18.test2 Error: object 'ari18.test2' not found > ls() [1] "ari18.test3" "array1""array2" "BaseballArticles" "BaseballArticles2" [6] "BaseballArticles3" "BBCorpus" "BBtdm" "firstfunction" "folder" [11] "h" "matrix""matrix2" "matrix3" "n" [16] "seq" "testvector""u" "vec" "x" [21] "y" "yourname" > Somehow ari18.test3 loaded but ari18.test2 will not. What am I missing here? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a Date Field
The date is imbedded in the GameID character field so I created a date vector with the following code: ari18.test3$date <- substring(ari18.test3$GameID,4,11) And then created a new dataframe with just the Game ID and date vectors. The date field is a character as shown by the str() command. str(test) 'data.frame': 3 obs. of 2 variables: $ GameID: Factor w/ 3 levels "ARI201803290",..: 1 2 3 $ date : chr "20180329" "20180330" "20180331" GameID date 1 ARI201803290 20180329 81 ARI201803300 20180330 165 ARI201803310 20180331 > My notes from about a week ago say that the following code will turn “date” > into a date field: test$date <- as.Date(test$date,format="%Y %M %D") date > becomes a date field but the data disappears into NA str(test) 'data.frame': 3 obs. of 2 variables: $ GameID: Factor w/ 3 levels "ARI201803290",..: 1 2 3 $ date : Date, format: NA NA NA What am I missing here? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] & statement within an ifelse Loop
Still putzing around trying to increment a count vector when the date changes. Date count 1 2018-03-29 1 2 2018-03-29 1 3 2018-03-29 1 81 2018-03-30 1 82 2018-03-30 1 83 2018-03-30 1 165 2018-03-31 1 166 2018-03-31 1 167 2018-03-31 1 I can get count to change when the date changes - lines 81 and 165 - by comparing the date to the date on the previous line (lag(Date,1)) but then the count returns to 1 on line 82 and line 166. test2 <- transform(test2, + count = ifelse(Date == lag(Date,1),count,count+1)) > test2 Date count 1 2018-03-29NA 2 2018-03-29 1 3 2018-03-29 1 81 2018-03-30 2 82 2018-03-30 1 83 2018-03-30 1 165 2018-03-31 2 166 2018-03-31 1 167 2018-03-31 1 test2 <- transform(test2, + count = ifelse(Date == lag(Date,1),(lag(count,1)),(lag(count,1)+1))) With the code above I get the same results. It seems to me that line 82 should have count = 2 since the dates on line 81 and 82 are the same so the count from line 82 should be the same as 81 - (lag(count,1)). Similarly, if line 83 were count = 2 then line 165 should be equal to 3. What am I missing here? Is there a way to add an & clause to either the if or the else clause such as: ((-2:2) >= 0) & ((-2:2) <= 0)I’ve tried this several different ways such as: (lag(count,1)) &(count = count+1). with no success. Thanks, Philip [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Loop With Dates
With the data snippet below I’m trying to increment the “count” vector by one each time the date changes. Date count 1 2018-03-29 1 2 2018-03-29 1 3 2018-03-29 1 81 2018-03-30 1 82 2018-03-30 1 83 2018-03-30 1 165 2018-03-31 1 166 2018-03-31 1 167 2018-03-31 1 > I can get count to change when the date changes with the following code: test2 <- transform(test2, + count = ifelse(Date == lag(Date,1),count,count+1)) > test2 Date count 1 2018-03-29NA 2 2018-03-29 1 3 2018-03-29 1 81 2018-03-30 2 82 2018-03-30 1 83 2018-03-30 1 165 2018-03-31 2 166 2018-03-31 1 167 2018-03-31 1 ...but I want all three March 30 rows to have a count of 2 and the March 31 rows to be equal to 3. Any suggestions? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] If Loop With Lagged Variable
Attached is every at bat for the Arizona Diamondback’s first three games of 2018 – BBdata1.rda. I added the Date and DHCode variables by parsing the first variable labeled GameID. BBdata2 is a reduced dataset with five variables as shown in the str() command. data.frame':234 obs. of 5 variables: $ GameID : Factor w/ 3 levels "ARI201803290",..: 1 1 1 1 1 1 1 1 1 1 ... $ Date : Date, format: "2018-03-29" "2018-03-29" "2018-03-29" "2018-03-29" ... $ DHCode : Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ... $ GameNum: num 1 1 1 1 1 1 1 1 1 1 ... $ Date2 : Date, format: NA "2018-03-29" "2018-03-29" "2018-03-29" ... I’m trying to increment the GameNum (game number) to game 2 when the date changes from 2018-03-29 to 2018-03-30 in row 81 and to game 3 in row 165. According to my R for Dummies book the following code should work but it doesn’t. I keep getting the following error. Any suggestions? if(ari18.test3$Date > lag(ari18.test3$Date)) {ari18.test3$gameNum <- ari18.tesm3$GameNum + 1} Warning message: In if (ari18.test3$Date > lag(ari18.test3$Date)) { : the condition has length > 1 and only the first element will be used > Thanks. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.