[R] Subtracting Data Frame With a Different Number of Rows

2020-04-21 Thread Phillip Heinrich
I have two small data frames of baseball data.  The first one is the mean 
number of runs that will score in each half inning for the 2018 Arizona 
Diamondbacks.  The second data frame is the same information but for only 
one player.  As you will see the individual player did not come up to bat 
any time during the season:

   with the bases loaded and no outs
   runners on first and third with one out

Overall

RunnerCodeOuts MeanRuns
1 Bases Empty 0   0.5137615
2 Runner:1st0   0.8967391
3 Runner:2nd   0   1.3018868
4 Runners:1st & 2nd0   1.6551724
5 Runner:3rd0   1.9545455
6 Runners:1st & 3rd 0   2.0571429
7 Runners:2nd & 3rd0   2.1578947
8 Bases Loaded0   3.2173913
9 Bases Empty  1   0.3963801
10 Runner:1st   1   0.6952596
11 Runner:2nd  1   0.9580838
12 Runners:1st & 2nd   1   1.4397163
13 Runner:3rd   1   1.5352113
14 Runners:1st & 3rd   11.5882353
15 Runners:2nd & 3rd  11.9215686
16 Bases Loaded  11.9193548
17 Bases Empty20.4191011
18 Runner:1st   20.5531915
19 Runner:2nd  20.8777293
20 Runners:1st & 2nd  2 0.9553073
21 Runner:3rd  2 1.2783505
22 Runners:1st & 3rd   2 1.5851064
23 Runners:2nd & 3rd  2 1.2794118
24 Bases Loaded 2  1.388235

Individual Player

 RunnerCode  Outs   MeanRuns
1 Bases Empty 0 0.4262295
2 Runner:1st0 1.320
3 Runner:2nd   0 1.2857143
4 Runners:1st & 2nd   0  0.5714286
5 Runner:3rd   0  2.000
6 Runners:1st & 3rd0  3.500
7 Runners:2nd & 3rd   0  1.000
8 Bases Empty 1  0.5238095
9 Runner:1st1  0.6578947
10 Runner:2nd 1  0.375
11 Runners:1st & 2nd 1   1.4285714
12 Runner:3rd 1   1.4285714
13 Runners:2nd & 3rd 1   0.667
14 Bases Loaded 1   3.000
15 Bases Empty   2   0.3469388
16 Runner:1st  2   0.1363636
17 Runner:2nd 2   0.7142857
18 Runners:1st & 2nd  2   1.667
19 Runner:3rd  2   1.250
20 Runners:1st & 3rd  22.1428571
21 Runners:2nd & 3rd 21.500
22 Bases Loaded 22.200

RunnersCode is a factor
Outs are integers
MeanRuns is numerical data

I would like to subtract the second from the first as a way to evaluate the 
players ability to produce runs. As part of this analysis I I would like to 
input the mean number of runs from the overall data frame into the two 
missing cells for the individual player:Bases Loaded no outs and 1st and 3rd 
one out.


Can anyone give me some advise?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Tidyverse Question

2020-03-23 Thread Phillip Heinrich
Can someone out there run the following code from the book Analyzing Baseball 
Data with R – Chapter 7 page 164?

library(tidyverse)
db <- src_sqlite(“data/pitchrx.sqlite”,create=TRUE)

Over the past two weeks this code has run correctly twice but I have gotten the 
following error dozens of times:

Error: Could not connect to database:
unable to open database file

I’m trying to figure out if the problem is with my computer or if the tidyverse 
package has been revised since this book was written.  I got the same error 
when I loaded R onto my wife’s Mac.

The file pitchrx.sqlite loaded into my directory C:/Users/Owner/Documents.  The 
data file db contains four xml files used later in the analysis.

Thanks.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Analyzing Baseball Data With R

2020-03-16 Thread Phillip Heinrich
Can’t get past first step of Chapter 7 page 164.

Opened a new RStudio window.  Loaded tidyverse and keyed in library(tidyverse) 
which of course includes dplyr.  The working directory is: 
C:/Users/Owner/Documents.

Then keyed in: db <- src_sqlite(“data/pitchrx.sqlite”,create=TRUE)

And got the following error: Error: Could not connect to database:
unable to open database file

Googled everything I could think of to find the sqlite function and the 
pitchrx.sqlite empty data base.  Can someone give me some direction?

I wondering if I have configured RStudio incorrectly.  Why doesn’t my by 
RStudio point to the correct data file?



 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Accessing Data From packages

2020-02-27 Thread Phillip Heinrich
I am continuing to have problems downloading data as prescribed in books 
about R such as “Analyzing Baseball Data with R”.


In chapter 3 (page 67) the instructions to download baseball Hall of Fame 
data from the package tidyverse are:


library(tidyverse)
-- Attaching packages --- tidyverse 
1.3.0 -- v ggplot2 3.2.1

v purrr  0.3.3 v tibble  2.1.3
v dplyr  0.8.3 v tidyr  1.0.0 v stringr 1.4.0
v readr  1.3.1 v forcats 0.4.0

-- Conflicts --
tidyverse_conflicts() -- x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Warning messages: 1: package ‘tidyverse’ was built under R version 3.6.2 2:
package ‘purrr’ was built under R version 3.6.2

The package seems to load correctly but when I try to call up the data I get 
an error message.

***


hof <- read_csv("data/hofbatting.csv")


Error: 'data/hofbatting.csv' does not exist in current working directory 
('C:/Users/Owner/Documents').


***
I have no idea where the data is hiding.  Can someone give me some 
directions.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Carpentry - Creating a New SQLite Database

2020-01-10 Thread Phillip Heinrich
Working my way through a tutorial named Data Carpentry 
(https://datacarpentry.org/R-ecology-lesson/).  for the most part it is 
excellent but I’m stuck on the very last section 
(https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html).

First, below are the packages I have loaded:
[1] "forcats"   "stringr"   "purrr" "readr" "tidyr" "tibble"
"ggplot2"   "tidyverse" "dbplyr""RMySQL""DBI"  
[12] "dplyr" "RSQLite"   "stats" "graphics"  "grDevices" "utils" 
"datasets"  "methods"   "base" 
 
 
>  
 

Second,
Second, is the text of the last section of the last chapter titled “Creating a 
New SQLite Database”.
Second, below is the text from the tutorial.  The black type is from the 
tutorial.  The green and blue is the suggested R code.  My comments are in red.
Creating a new SQLite database
So far, we have used a previously prepared SQLite database. But we can also use 
R to create a new database, e.g. from existing csv files. Let’s recreate the 
mammals database that we’ve been working with, in R. First let’s download and 
read in the csv files. We’ll import tidyverse to gain access to the read_csv() 
function.

download.file("https://ndownloader.figshare.com/files/3299483;,
  "data_raw/species.csv")
download.file("https://ndownloader.figshare.com/files/10717177;,
  "data_raw/surveys.csv")
download.file("https://ndownloader.figshare.com/files/3299474;,
  "data_raw/plots.csv")
library(tidyverse)
species <- read_csv("data_raw/species.csv")No problem here.  I’m pulling three 
databases from the Web and saving them to a folder on my hard drive. 
(...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots 
<- read_csv("data_raw/plots.csv")Again no problem.  I’m just creating an R data 
files.  But here is where I loose it.  I’m creating something named my_db_file 
from another file named portal-database-output with an sqlite extension and 
then creating my_db from the My_db_file.  Not sure where the sqlite extension 
file came from. Creating a new SQLite database with dplyr is easy. You can 
re-use the same command we used above to open an existing .sqlite file. The 
create = TRUE argument instructs R to create a new, empty database instead.

Caution: When create = TRUE is added, any existing database at the same 
location is overwritten without warning.

my_db_file <- "data/portal-database-output.sqlite"
my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is 
empty, it doesn’t contain any tables:

my_db#> src:  sqlite 3.29.0 [data/portal-database-output.sqlite]
#> tbls:To add tables, we copy the existing data.frames into the database one 
by one:

copy_to(my_db, surveys)
copy_to(my_db, plots)
my_dbI can follow the directions to fill in my_db but I have no idea how to 
access the tables.  The text from the tutorial below says to check the location 
of our database.  Huh!  Can someone give me some direction.  Thanks.





If you check the location of our database you’ll see that data is automatically 
being written to disk. R and dplyr not only provide easy ways to query existing 
databases, they also allows you to easily create your own databases from flat 
files!



Here is where I loose it.  


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] If Loop I Think

2019-10-22 Thread Phillip Heinrich
  Row Outs RunnerFirst RunnerSecond RunnerThird R1 R2 R3 
  1 0   
  2 1   
  3 1   
  4 1 arenn001  
  5 2 arenn001  
  6 0   
  7 0 perad001  
  8 0 polla001 perad001 
  9 0 goldp001 polla001 perad001
  10 0  lambj001 goldp001
  11 1  lambj001 goldp001
  12 2   lambj001
  13 0   
  14 1   



With the above data, Arizona Diamondbacks baseball, I’m trying to put zeros 
into the R1 column is the RunnerFirst column is blank and a one if the column 
has a coded entry such as rows 4,5,7,8,& 9.  Similarly I want zeros in R2 and 
R3 if RunnerSecond and RunnerThird respectively are blank and ones if there is 
an entry.  

I’ve tried everything I know how to do such as “If Loops”, “If-Then loops”, 
“apply”, “sapply”, etc.  I wrote function below and it ran without errors but I 
have no idea what to do with it to accomplish my goal:

R1 <- function(x) {  
  if (ari18.test3$RunnerFirst == " "){
   ari18.test3$R1 <- 0
   return(R1)
 }else{
   R1 <- ari18.test3$R1 <- 1
   return(R1)
 }
   }

The name of the data frame is ari18.test3

On a more philosophical note, data handling in R seems to be made up of 
thousands of details with no over-riding principles.  I’ve read two books on R 
and a number of tutorial and watched several videos but I don’t seem to be 
making any progress.  Can anyone suggest videos, or tutorials, or books that 
might help?  Database stuff has never been my strong point but I’m determined 
to learn.

Thanks,
Philip Heinrich
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Another Real Basic Question

2019-10-16 Thread Phillip Heinrich
In the Source window of RStudio (upper left) I save my code (File/Save) but can 
not reload it.  There is a file labeled (RECode.R) but neither File/Open file 
or File/Recent Files gets me anywhere.

Any ideas what I’m doing wrong.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mapple

2019-10-01 Thread Phillip Heinrich
With the snippet of data below I’m trying to do an if/then type of thing:
row 1 – if all five variables equal 0 then code equals 1;
row 3 – if v1 = 1 and v2 = 1 then code = 5;
row 7 – if v1 = 0 and v2 = 1 and v3 = 2 then code = 10

There are 24 codes in the complete database.


   v1 v2 v3 v4 v5 code
1   0  0  0  0  01
2   1  4  0  0  01
3   1  1  0  0  01
4   1  0  1  0  01
5   2  0  1  0  01
6   0  1  0  0  01
7   0  1  2  0  01
8   0  1  2  3  01
9   0  2  3  4  41
10  0  0  0  2  31  I understand that the mapply function can do things 
like this but I have been reading documentation and poking around with Google 
but am getting nowhere.  Any advise whould be greatly appreciated.  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Real Basic Question

2019-09-26 Thread Phillip Heinrich
Just when I think I’m starting to get the hang of R I run into something that 
sends me back to Go without collecting $200.

The working directory seems to be correct when I load an .rda file but it is 
not there and it is not in the Global Environment in the upper right hand 
window in RStudio.
getwd()
[1] "C:/Users/Owner/Documents/Baseball/RetroSheetDocumentation"
> load("~/Baseball/RetroSheetDocumentation/ari18.test2.rda")
> ari18.test2
Error: object 'ari18.test2' not found
> ls()
 [1] "ari18.test3"   "array1""array2"
"BaseballArticles"  "BaseballArticles2"
 [6] "BaseballArticles3" "BBCorpus"  "BBtdm" 
"firstfunction" "folder"   
[11] "h" "matrix""matrix2"   "matrix3"  
 "n"
[16] "seq"   "testvector""u" "vec"  
 "x"
[21] "y" "yourname" 
 
 
>  
 


Somehow ari18.test3 loaded but ari18.test2 will not.

What am I missing here?

Thanks.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a Date Field

2019-09-24 Thread Phillip Heinrich
The date is imbedded in the GameID character field so I created a date vector 
with the following code:

ari18.test3$date <- substring(ari18.test3$GameID,4,11)  And then created a new 
dataframe with just the Game ID and date vectors.  The date field is a 
character as shown by the str() command. str(test)
'data.frame':   3 obs. of  2 variables:
 $ GameID: Factor w/ 3 levels "ARI201803290",..: 1 2 3
 $ date  : chr  "20180329" "20180330" "20180331"   GameID date
1   ARI201803290 20180329
81  ARI201803300 20180330
165 ARI201803310 20180331
> My notes from about a week ago say that the following code will turn “date” 
> into a date field: test$date <- as.Date(test$date,format="%Y %M %D")  date 
> becomes a date field but the data disappears into NA   str(test)
'data.frame':   3 obs. of  2 variables:
 $ GameID: Factor w/ 3 levels "ARI201803290",..: 1 2 3
 $ date  : Date, format: NA NA NA  What am I missing here?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] & statement within an ifelse Loop

2019-09-21 Thread Phillip Heinrich
Still putzing around trying to increment a count vector when the date changes.  

 Date count
1   2018-03-29 1
2   2018-03-29 1
3   2018-03-29 1
81  2018-03-30 1
82  2018-03-30 1
83  2018-03-30 1
165 2018-03-31 1
166 2018-03-31 1
167 2018-03-31 1
 
I can get count to change when the date changes  - lines 81 and 165 - by 
comparing the date to the date on the previous line (lag(Date,1)) but then the 
count returns to 1 on line 82 and line 166.

test2 <- transform(test2,
+   count = ifelse(Date == lag(Date,1),count,count+1))
> test2
  Date count
1   2018-03-29NA
2   2018-03-29 1
3   2018-03-29 1
81  2018-03-30 2
82  2018-03-30 1
83  2018-03-30 1
165 2018-03-31 2
166 2018-03-31 1
167 2018-03-31 1

test2 <- transform(test2,
+   count = ifelse(Date == lag(Date,1),(lag(count,1)),(lag(count,1)+1)))



With the code above I get the same results.  It seems to me that line 82 should 
have count = 2 since the dates on line 81 and 82 are the same so the count from 
line 82 should be the same as 81 -  (lag(count,1)).  Similarly, if line 83 were 
count = 2 then line 165 should be equal to 3.

What am I missing here?  Is there a way to add an & clause to either the if or 
the else clause such as:

((-2:2) >= 0) & ((-2:2) <= 0)I’ve tried this several different ways such as:

(lag(count,1)) &(count = count+1).  

with no success.

Thanks,
Philip

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Loop With Dates

2019-09-20 Thread Phillip Heinrich
With the data snippet below I’m trying to increment the “count” vector by one 
each time the date changes.  

 Date count
1   2018-03-29 1
2   2018-03-29 1
3   2018-03-29 1
81  2018-03-30 1
82  2018-03-30 1
83  2018-03-30 1
165 2018-03-31 1
166 2018-03-31 1
167 2018-03-31 1
 
 
>  
 


I can get count to change when the date changes with the following code:

test2 <- transform(test2,
+   count = ifelse(Date == lag(Date,1),count,count+1))
> test2
  Date count
1   2018-03-29NA
2   2018-03-29 1
3   2018-03-29 1
81  2018-03-30 2
82  2018-03-30 1
83  2018-03-30 1
165 2018-03-31 2
166 2018-03-31 1
167 2018-03-31 1
 
 
   

 


...but I want all three March 30 rows to have a count of 2 and the March 31 
rows to be equal to 3.  Any suggestions?

Thanks.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] If Loop With Lagged Variable

2019-09-19 Thread Phillip Heinrich
Attached is every at bat for the Arizona Diamondback’s first three games of 
2018 – BBdata1.rda.  I added the Date and DHCode variables by parsing the first 
variable labeled GameID.

BBdata2 is a reduced dataset with five variables as shown in the str() command.

data.frame':234 obs. of  5 variables:
 $ GameID : Factor w/ 3 levels "ARI201803290",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Date   : Date, format: "2018-03-29" "2018-03-29" "2018-03-29" "2018-03-29" 
...
 $ DHCode : Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ...
 $ GameNum: num  1 1 1 1 1 1 1 1 1 1 ...
 $ Date2  : Date, format: NA "2018-03-29" "2018-03-29" "2018-03-29" ...
  I’m trying to increment the GameNum (game number) to game 2 when the date 
changes from 2018-03-29 to 2018-03-30 in row 81 and to game 3 in row 165.

According to my R for Dummies book the following code should work but it 
doesn’t.  I keep getting the following error.  Any suggestions?

if(ari18.test3$Date > lag(ari18.test3$Date)) {ari18.test3$gameNum <- 
ari18.tesm3$GameNum + 1}
Warning message:
In if (ari18.test3$Date > lag(ari18.test3$Date)) { :
  the condition has length > 1 and only the first element will be used
 
 
>  
 


Thanks.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.