Re: [R] Inquiry About R Packages for Specific Research Areas
Hello, There is a CRAN Task View: Epidemiology that should be or have what you are looking for. [1] https://CRAN.R-project.org/view=Epidemiology Hope this helps, Rui Barradas Às 06:29 de 19/09/2024, Aleena Shaji escreveu: Dear R Support Team, I hope this email finds you well. I am writing to inquire about the specific R packages that would best suit our academic research project, which involves analyses in various fields. We are particularly interested in the following areas: Epidemiology Analysis: We are aware that packages like epiR, survival, and epitools exist for epidemiological analysis. Could you please confirm which of these (or others) would be most suitable for our needs? Dietary Intake/Analysis: We are considering packages like foodfreq and Dietary for dietary intake analysis. Are these the best options, or do you recommend other packages for this purpose? Pedigree Analysis: We are exploring the kinship2 and pedigree packages for pedigree data analysis. Is there a package you would suggest for more comprehensive analysis? Migration-Related Study: We are interested in migration-related studies and have identified the migrant and spatstat packages. Would these be the most appropriate, or are there others we should consider? We would appreciate your guidance in selecting the best packages that align with our research interests. Additionally, are there any resources or documentation that you recommend for getting started with these packages? Thank you for your support, and we look forward to your response. Best regards, Aleena [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Às 15:23 de 16/09/2024, Francesca escreveu: Sorry for posting a non understandable code. In my screen the dataset looked correctly. I recreated my dataset, folllowing your example: test<-data.frame(matrix(c( 8, 8, 5 , 5 ,NA ,NA , 1, 15, 20, 5, NA, 17, 2 , 5 , 5, 2 , 5 ,NA, 5 ,10, 10, 5 ,12, NA), c( 18, 5, 5, 5, NA, 9, 2, 2, 10, 7 , 5, 19, NA, 10, NA, 4, NA, 8, NA, 5, 10, 3, 17, NA), c( 4, 3, 3, 2, 2, 4, 3, 3, 2, 4, 4 ,3, 4, 4, 4, 2, 2, 3, 2, 3, 3, 2, 2 ,4), c(3, 8, 1, 2, 4, 2, 7, 6, 3, 5, 1, 3, 8, 4, 7, 5, 8, 5, 1, 2, 4, 7, 6, 6))) colnames(test)<-c("cp1","cp2","role","groupid") What I have done so far is the following, that works: test %>% group_by(groupid) %>% mutate(across(starts_with("cp"), list(mean = mean))) But the problem is with NA: everytime the mean encounters a NA, it creates NA for all group members. I need the software to calculate the mean ignoring NA. So when the group is made of three people, mean of the three. If the group is two values and an NA, calculate the mean of two. My code works , creates a mean at each position for three subjects, replacing instead of the value of the single, the group mean. But when NA appears, all the group gets NA. Perhaps there is a different way to obtain the same result. On Mon, 16 Sept 2024 at 11:35, Rui Barradas wrote: Às 08:28 de 16/09/2024, Francesca escreveu: Dear Contributors, I hope someone has found a similar issue. I have this data set, cp1 cp2 role groupid 1 10 13 4 5 2 5 10 3 1 3 7 7 4 6 4 10 4 2 7 5 5 8 3 2 6 8 7 4 4 7 8 8 4 7 8 10 15 3 3 9 15 10 2 2 10 5 5 2 4 11 20 20 2 5 12 9 11 3 6 13 10 13 4 3 14 12 6 4 2 15 7 4 4 1 16 10 0 3 7 17 20 15 3 8 18 10 7 3 4 19 8 13 3 5 20 10 9 2 6 I need to to average of groups, using the values of column groupid, and create a twin dataset in which the mean of the group is replaced instead of individual values. So for example, groupid 3, I calculate the mean (12+18)/2 and then I replace in the new dataframe, but in the same positions, instead of 12 and 18, the values of the corresponding mean. I found this solution, where db10_means is the output dataset, db10 is my initial data. db10_means<-db10 %>% group_by(groupid) %>% mutate(across(starts_with("cp"), list(mean = mean))) It works perfectly, except that for NA values, where it replaces to all group members the NA, while in some cases, the group is made of some NA and some values. So, when I have a group of two values and one NA, I would like that for those with a value, the mean is replaced, for those with NA, the NA is replaced. Here the mean function has not the na.rm=T option associated, but it appears that this solution cannot be implemented in this case. I am not even sure that this would be enough to solve my problem. Thanks for any help provided. Hello, Your data is a mess, please don't post html, this is plain text only list. Anyway, I managed to create a data frame by copying the data to a file named "rhelp.txt" and then running db10 <- scan(file = "rhelp.txt", what = character()) header <- db10[1:4] db10 <- db10[-(1:4)] |> as.numeric() db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |> as.data.frame() |> setNames(header) str(db10) #> 'data.frame':25 obs. of 4 variables: #> $ cp1: num 1 5 3 7 10 5 2 4 8 10 ... #> $ cp2: num 10 2 1 4 4 5 6 4 4 15 ... #> $ role : num 13 5 3 6 2 8 8 7 7 3 ... #> $ groupid: num 4 10 7 4 7 3 7 8 8 3 ... And here is the data in dput format. db10 <- structure(list( cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2, 2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10), cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10, 4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9), role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5, 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2), groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5, 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)), class = "data.frame", row.names = c(NA, -25L)) As for the problem, I am not sure if you want summarise instead of mutate but here is a summarise solution. library(dplyr) db10 %>% group_by(groupid) %>% summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE))) # same result, summarise's new argument .by avoids the need to group_by db10 %>% summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by = groupid) Can you post the expected output too? Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com Hello, Something like this? test <- structure(list( cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2
Re: [R] (no subject)
Às 08:28 de 16/09/2024, Francesca escreveu: Dear Contributors, I hope someone has found a similar issue. I have this data set, cp1 cp2 role groupid 1 10 13 4 5 2 5 10 3 1 3 7 7 4 6 4 10 4 2 7 5 5 8 3 2 6 8 7 4 4 7 8 8 4 7 8 10 15 3 3 9 15 10 2 2 10 5 5 2 4 11 20 20 2 5 12 9 11 3 6 13 10 13 4 3 14 12 6 4 2 15 7 4 4 1 16 10 0 3 7 17 20 15 3 8 18 10 7 3 4 19 8 13 3 5 20 10 9 2 6 I need to to average of groups, using the values of column groupid, and create a twin dataset in which the mean of the group is replaced instead of individual values. So for example, groupid 3, I calculate the mean (12+18)/2 and then I replace in the new dataframe, but in the same positions, instead of 12 and 18, the values of the corresponding mean. I found this solution, where db10_means is the output dataset, db10 is my initial data. db10_means<-db10 %>% group_by(groupid) %>% mutate(across(starts_with("cp"), list(mean = mean))) It works perfectly, except that for NA values, where it replaces to all group members the NA, while in some cases, the group is made of some NA and some values. So, when I have a group of two values and one NA, I would like that for those with a value, the mean is replaced, for those with NA, the NA is replaced. Here the mean function has not the na.rm=T option associated, but it appears that this solution cannot be implemented in this case. I am not even sure that this would be enough to solve my problem. Thanks for any help provided. Hello, Your data is a mess, please don't post html, this is plain text only list. Anyway, I managed to create a data frame by copying the data to a file named "rhelp.txt" and then running db10 <- scan(file = "rhelp.txt", what = character()) header <- db10[1:4] db10 <- db10[-(1:4)] |> as.numeric() db10 <- matrix(db10, ncol = 4L, byrow = TRUE) |> as.data.frame() |> setNames(header) str(db10) #> 'data.frame':25 obs. of 4 variables: #> $ cp1: num 1 5 3 7 10 5 2 4 8 10 ... #> $ cp2: num 10 2 1 4 4 5 6 4 4 15 ... #> $ role : num 13 5 3 6 2 8 8 7 7 3 ... #> $ groupid: num 4 10 7 4 7 3 7 8 8 3 ... And here is the data in dput format. db10 <- structure(list( cp1 = c(1, 5, 3, 7, 10, 5, 2, 4, 8, 10, 9, 2, 2, 20, 9, 13, 3, 4, 4, 10, 17, 8, 3, 13, 10), cp2 = c(10, 2, 1, 4, 4, 5, 6, 4, 4, 15, 15, 10, 4, 2, 11, 10, 14, 2, 4, 0, 20, 18, 4, 3, 9), role = c(13, 5, 3, 6, 2, 8, 8, 7, 7, 3, 10, 5, 11, 5, 3, 13, 12, 15, 1, 3, 15, 10, 19, 5, 2), groupid = c(4, 10, 7, 4, 7, 3, 7, 8, 8, 3, 2, 5, 20, 12, 6, 4, 6, 7, 16, 7, 3, 7, 8, 20, 6)), class = "data.frame", row.names = c(NA, -25L)) As for the problem, I am not sure if you want summarise instead of mutate but here is a summarise solution. library(dplyr) db10 %>% group_by(groupid) %>% summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE))) # same result, summarise's new argument .by avoids the need to group_by db10 %>% summarise(across(starts_with("cp"), ~ mean(.x, na.rm = TRUE)), .by = groupid) Can you post the expected output too? Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] "And" condition spanning over multiple columns in data frame
Às 08:42 de 12/09/2024, Francesca escreveu: Dear contributors, I need to create a set of columns, based on conditions of a dataframe as follows. I have managed to do the trick for one column, but I do not seem to find any good example where the condition is extended to all the dataframe. I have these dataframe called c10Dt: id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12 1 1 NA NA NA NA NA NA NA NA NA NA NA NA 2 4 8 18 15 10 12 11 9 18 8 16 15 NA 3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 4 3 5 5 4 4 3 2 1 3 2112 5 1 NA NA NA NA NA NA NA NA NA NA NA NA 6 2 5 5 10 10 9 10 10 10 NA 109 10 -- Columns are id, cp1, cp2.. and so on. What I need to do is the following, made on just one column: c10Dt <- mutate(c10Dt, exit1= ifelse(is.na(cp1) & id!=1, 1, 0)) So, I create a new variable, called exit1, in which the program selects cp1, checks if it is NA, and if it is NA but also the value of the column "id" is not 1, then it gives back a 1, otherwise 0. So, what I want is that it selects all the cases in which the id=2,3, or 4 is not NA in the corresponding values of the matrix. I managed to do it manually column by column, but I feel there should be something smarter here. The problem is that I need to replicate this over all the columns from cp2, to cp12, but keeping fixed the id column instead. I have tried with c10Dt %>% mutate(x=across(starts_with("cp"), ~ifelse(. == NA)) & id!=1,1,0 ) but the problem with across is that it will implement the condition only on cp_ columns. How do I tell R to use the column id with all the other columns? Thanks for any help provided. Francesca -- Hello, Something like this? 1. If an ifelse instruction is meant to create a binary result, coerce the logical condition to integer instead. You can make it more clear by substituting as.integer for the plus sign below; 2. the .names argument is used to create new columns and keeping the original ones. df1 <- read.table(text = "id cp1 cp2 cp3 cp4 cp5 cp6 cp7 cp8 cp9 cp10 cp11 cp12 1 1 NA NA NA NA NA NA NA NA NA NA NA NA 2 4 8 18 15 10 12 11 9 18 8 16 15 NA 3 3 8 5 5 4 NA 5 NA 6 NA 10 10 10 4 3 5 5 4 4 3 2 1 3 2112 5 1 NA NA NA NA NA NA NA NA NA NA NA NA 6 2 5 5 10 10 9 10 10 10 NA 109 10", header = TRUE) df1 library(dplyr) df1 %>% mutate(across(starts_with("cp"), ~ +(is.na(.) & id != 1), .names = "{col}_new")) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prediction from Arima model
Às 18:54 de 31/08/2024, Christofer Bogaso escreveu: Hi, I have run following code to obtain one step ahead confidence interval from am arima model library(forecast) set.seed(100) forecast(Arima(rnorm(100), order = c(1,0,1), xreg = rt(100, 1)), h = 1, xreg = 10) However this appear to provide the Prediction interval, however I wanted to get the confidence interval for the new value. Is there any way to get the confidence interval for the new value? I also wanted to get the estimate of SE for the new value which is used to obtain the confidence interval of the new value. Is there any method available to obtain that? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To get the se use ?predict.Arima instead. library(forecast) set.seed(100) model <- Arima(rnorm(100), order = c(1,0,1), xreg = rt(100, 1)) # in predict.Arima, se.fit defaults to TRUE pred <- predict(model, n.ahead = 1, newxreg = 10) pred$se c(pred$se) # with more points ahead predict(model, n.ahead = 2, newxreg = c(10, 12)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregating data with quality control
m = na.rm) status_with_D <- sample(c('C', 'D'), 45, TRUE, c(.9, .1)) mydf$status <- c(rep("C", 50), "S", status_with_D) subset_condition <- if(any(mydf$status == "D")) mydf$status == "D" else TRUE aggregate(hs ~ format(data_POSIX, "%Y-%m-%d") + status, mydf, my.mean, subset = subset_condition) #> format(data_POSIX, "%Y-%m-%d") status hs #> 1 2024-01-02 D 51.2 # the formats in the OP but extracted from the date/time and used in the formula that follows. year <- format(mydf$data_POSIX, "%Y") month <- format(mydf$data_POSIX, "%m") day <- format(mydf$data_POSIX, "%d") aggregate(hs ~ year + month + day, mydf, my.mean) #> year month day hs #> 1 202401 01 52.37500 #> 2 202401 02 45.64583 aggregate(hs ~ year + month + day + status, mydf, my.mean, subset = subset_condition) #> year month day status hs #> 1 202401 02 D 51.2 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fill NA values in columns with values of another column
Às 16:24 de 28/08/2024, Ebert,Timothy Aaron escreveu: Why not use na.omit() and then go from there? Unless one handles NA differently in different groups there is no point in processing the data by groups to remove NA even if later analysis steps do require group information. Tim -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Wednesday, August 28, 2024 4:19 AM To: Francesca PANCOTTO ; r-help@r-project.org Subject: Re: [R] Fill NA values in columns with values of another column [External Email] Às 11:23 de 27/08/2024, Francesca PANCOTTO via R-help escreveu: Dear Contributors, I have a problem with a database composed of many individuals for many periods, for which I need to perform a manipulation of data as follows. Here I report the procedure I need to do for the first 32 observations of the first period. cbind(VB1d[,1],s1id[,1]) [,1] [,2] [1,]68 [2,]95 [3,] NA1 [4,]56 [5,] NA7 [6,] NA2 [7,]44 [8,]27 [9,]27 [10,] NA3 [11,] NA2 [12,] NA4 [13,]56 [14,]95 [15,] NA5 [16,] NA6 [17,] 103 [18,]72 [19,]21 [20,] NA7 [21,]72 [22,] NA8 [23,] NA4 [24,] NA5 [25,] NA6 [26,]21 [27,]44 [28,]68 [29,] 103 [30,] NA3 [31,] NA8 [32,] NA1 In column s1id, I have numbers from 1 to 8, which are the id of 8 groups , randomly mixed in the larger group of 32. For each group, I want the value that is reported for only to group members, to all the four group members. For example, value 8 in first row , second column, is group 8. The value for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to 8, I have 6. But in row 22, the value 8 of the second variable, reports a value NA. in each group is the same, only two values have the correct number, the other two are NA. I need that each group, identified by the values of the variable S1id, correctly report the number of variable VB1d that is present for just two group members. I hope my explanation is acceptable. The task appears complex to me right now, especially because I will need to multiply this procedure for x12x14 similar databases. Anyone has ever encountered a similar problem? Thanks in advance for any help provided. -- Francesca Pancotto Associate Professor Political Economy University of Modena, Largo Santa Eufemia, 19, Modena Office Phone: +39 0522 523264 Web: *https://sit/ es.google.com%2Fview%2Ffrancescapancotto%2Fhome&data=05%7C02%7Ctebert% 40ufl.edu%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a6 2331e1b84%7C0%7C0%7C638604299508876897%7CUnknown%7CTWFpbGZsb3d8eyJWIjo iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C% 7C&sdata=yHdkL%2BmnsHgL1O3nE%2B0r4Wf5nvRgJp66VWJHHiYJVGA%3D&reserved=0 <https://sit/ es.google.com%2Fview%2Ffrancescapancotto%2Fhome&data=05%7C02%7Ctebert% 40ufl.edu%7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a6 2331e1b84%7C0%7C0%7C638604299508887226%7CUnknown%7CTWFpbGZsb3d8eyJWIjo iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C% 7C&sdata=XsB7jdjGD5S7YKiyPhY5DSR%2F1yhPrTuFxdA5qz3KEBY%3D&reserved=0>* -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat/ .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Ctebert%40ufl.edu %7C0ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a62331e1b84 %7C0%7C0%7C638604299508890269%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata= BLTZvAFGtdZUoKefcgEtEsrw5pm4UHRUZJCGLXx5QFE%3D&reserved=0 PLEASE do read the posting guide https://www/. r-project.org%2Fposting-guide.html&data=05%7C02%7Ctebert%40ufl.edu%7C0 ca2745d1f2142a0723608dcc73a15e3%7C0d4da0f84a314d76ace60a62331e1b84%7C0 %7C0%7C638604299508893127%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=q4Mj %2BjSL2ZG0%2Fi0%2FrBUR3Z2B%2BbV6eH35to2Rt6kHUZ8%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code. Hello, Here is a solution. Split the 1st column by the 2nd, keep only the not-NA values and unlist, to have a named vector. Then put the names and the values together with cbind. mat <- structure( c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L, 9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L, 10L, NA, NA, NA, 8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L, 4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L, 8L, 3L, 3L, 8L, 1L), dim = c(32L, 2L)) res <- split(mat[, 1L], mat[, 2L]) |> lapply(\(x) x[!is.na(x)]) |> unlist() nms <
Re: [R] Fill NA values in columns with values of another column
Às 11:23 de 27/08/2024, Francesca PANCOTTO via R-help escreveu: Dear Contributors, I have a problem with a database composed of many individuals for many periods, for which I need to perform a manipulation of data as follows. Here I report the procedure I need to do for the first 32 observations of the first period. cbind(VB1d[,1],s1id[,1]) [,1] [,2] [1,]68 [2,]95 [3,] NA1 [4,]56 [5,] NA7 [6,] NA2 [7,]44 [8,]27 [9,]27 [10,] NA3 [11,] NA2 [12,] NA4 [13,]56 [14,]95 [15,] NA5 [16,] NA6 [17,] 103 [18,]72 [19,]21 [20,] NA7 [21,]72 [22,] NA8 [23,] NA4 [24,] NA5 [25,] NA6 [26,]21 [27,]44 [28,]68 [29,] 103 [30,] NA3 [31,] NA8 [32,] NA1 In column s1id, I have numbers from 1 to 8, which are the id of 8 groups , randomly mixed in the larger group of 32. For each group, I want the value that is reported for only to group members, to all the four group members. For example, value 8 in first row , second column, is group 8. The value for group 8 of the variable VB1d is 6. At row 28, again for s1id equal to 8, I have 6. But in row 22, the value 8 of the second variable, reports a value NA. in each group is the same, only two values have the correct number, the other two are NA. I need that each group, identified by the values of the variable S1id, correctly report the number of variable VB1d that is present for just two group members. I hope my explanation is acceptable. The task appears complex to me right now, especially because I will need to multiply this procedure for x12x14 similar databases. Anyone has ever encountered a similar problem? Thanks in advance for any help provided. -- Francesca Pancotto Associate Professor Political Economy University of Modena, Largo Santa Eufemia, 19, Modena Office Phone: +39 0522 523264 Web: *https://sites.google.com/view/francescapancotto/home <https://sites.google.com/view/francescapancotto/home>* -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here is a solution. Split the 1st column by the 2nd, keep only the not-NA values and unlist, to have a named vector. Then put the names and the values together with cbind. mat <- structure( c(6L, 9L, NA, 5L, NA, NA, 4L, 2L, 2L, NA, NA, NA, 5L, 9L, NA, NA, 10L, 7L, 2L, NA, 7L, NA, NA, NA, NA, 2L, 4L, 6L, 10L, NA, NA, NA, 8L, 5L, 1L, 6L, 7L, 2L, 4L, 7L, 7L, 3L, 2L, 4L, 6L, 5L, 5L, 6L, 3L, 2L, 1L, 7L, 2L, 8L, 4L, 5L, 6L, 1L, 4L, 8L, 3L, 3L, 8L, 1L), dim = c(32L, 2L)) res <- split(mat[, 1L], mat[, 2L]) |> lapply(\(x) x[!is.na(x)]) |> unlist() nms <- names(res) res <- cbind( VB1d = res, s1id = substr(nms, 1, nchar(nms) - 1L) |> as.integer() ) res #>VB1d s1id #> 1121 #> 1221 #> 2172 #> 2272 #> 31 103 #> 32 103 #> 4144 #> 4244 #> 5195 #> 5295 #> 6156 #> 6256 #> 7127 #> 7227 #> 8168 #> 8268 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Very strange behavior of 'rep'
Às 19:39 de 15/08/2024, Izmirlian, Grant (NIH/NCI) [E] via R-help escreveu: \n<>\n\n \n<< This is very weird. I was running a swarm job on the cluster and it bombed only for n.per.grp=108, not for the other values. Even though n.per.grp*n.tt is 540, so that the length of the call to 'rep' should be 1080, I'm getting a vector of length 1078. n.per.grp <- 108 n.tt <- 5 n.per.grp*n.tt length(rep(0:1, each=n.per.grp*n.tt)) length(rep(0:1, each=108*5)) \n<>\n\n\n\n --please do not edit the information below-- R Version: platform = x86_64-pc-linux-gnu arch = x86_64 os = linux-gnu system = x86_64, linux-gnu status = major = 4 minor = 4.1 year = 2024 month = 06 day = 14 svn rev = 86737 language = R version.string = R version 4.4.1 (2024-06-14) nickname = Race for Your Life Locale: LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C.UTF-8;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C Search Path: .GlobalEnv, package:lme4, package:Matrix, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:showtext, package:showtextdb, package:sysfonts, package:methods, Autoloads, package:base [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I cannot reproduce this behavior. n.per.grp <- 108 n.tt <- 5 n.per.grp*n.tt #> [1] 540 length(rep(0:1, each = n.per.grp*n.tt)) #> [1] 1080 length(rep(0:1, each = 108*5)) #> [1] 1080 But my version of R and my OS are different. (I don't see how the error in the OP can be related to R version or OS.) R.version #>_ #> platform x86_64-w64-mingw32 #> arch x86_64 #> os mingw32 #> crtucrt #> system x86_64, mingw32 #> status #> major 4 #> minor 4.1 #> year 2024 #> month 06 #> day14 #> svn rev86737 #> language R #> version.string R version 4.4.1 (2024-06-14 ucrt) #> nickname Race for Your Life Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing
Às 15:36 de 11/08/2024, Steven Yen escreveu: Thanks. Will try it. Have not tried it but I think the following may work: out$results<-NULL out$results$ei<-ap out$results$vi<-vap All I need is printing by returning out (unless I turn it off). And, retrieve ap and vap as needed as shown above. Guess I need to read more about invisible. On 8/11/2024 10:09 PM, Rui Barradas wrote: Às 09:51 de 11/08/2024, Steven Yen escreveu: Hi In the following codes, I had to choose between printing (= TRUE) or deliver something for grab (ei, vi). Is there a way to get both--that is, to print and also have ei and vi for grab? Thanks. Steven ... out<-round(as.data.frame(cbind(ap,se,t,p)),digits) out<-cbind(out,sig) out<-out[!grepl(colnames(zx)[1],rownames(out)),] if(printing){ cat("\nAPPs of bivariate ordered probit probabilities", "\nWritten by Steven T. Yen (Last update: 08.11.24)", "\ny1.level=", y1.level, " y2.level=", y2.level, "\njoint12 =", joint12, "\nmarg1 =", marg1, "\nmarg2 =", marg2, "\ncond12 =", cond12, "\ncond21 =", cond21, "\nCovariance matrix:",vb.method, "\nWeighted =", weighted, "\nAt means =", mean, "\nProb x 100 =", times100, "\ntesting =" , testing, "\nuse_bb_and_vbb = ",use_bb_and_vbb, "\nsample size =", length(y1),"\n") if (!resampling) cat("\nSEs by delta method","\n") if (resampling) cat("\nSEs K-R resampling with",ndraws,"draws\n") return(out) } else { invisible(list("ei"=ap,"vi"=vap)) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Maybe change the end of the code to return a bigger list. ll <- list(out = out, ei = ap, vi = vap) return(ll) } else { invisible(list("ei"=ap,"vi"=vap)) Hope this helps, Rui Barradas Hello, Use descriptive names, print the data frame in the function and return a list invisibly. Also, 1) Why create a data.frame with rounded numbers? I never do this. To round numbers is a matter of results display and in the case of df's should be left to the print.data.frame method. Always return the numbers as they are. See comment below. 2) And never, ever code the creation of a df as as.data.frame(cbind(.)) If the vectors are a mix of numeric and character they will all be coerced to the least common denominator, all vectors will become of class character. # don't do this df <- round(as.data.frame(cbind(ap, se, t, p)), digits) df <- cbind(df, sig) # do this instead df <- data.frame(ap, se, t, p, sig) df <- df[!grepl(colnames(zx)[1],rownames(df)), ] out <- NULL if(printing){ # [...rest of code...] # out$data <- df cat("data:\n") print(out$data) } out$results <- list(ei = ap, vi = vap) invisible(out) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Printing
Às 09:51 de 11/08/2024, Steven Yen escreveu: Hi In the following codes, I had to choose between printing (= TRUE) or deliver something for grab (ei, vi). Is there a way to get both--that is, to print and also have ei and vi for grab? Thanks. Steven ... out<-round(as.data.frame(cbind(ap,se,t,p)),digits) out<-cbind(out,sig) out<-out[!grepl(colnames(zx)[1],rownames(out)),] if(printing){ cat("\nAPPs of bivariate ordered probit probabilities", "\nWritten by Steven T. Yen (Last update: 08.11.24)", "\ny1.level=", y1.level, " y2.level=", y2.level, "\njoint12 =", joint12, "\nmarg1 =", marg1, "\nmarg2 =", marg2, "\ncond12 =", cond12, "\ncond21 =", cond21, "\nCovariance matrix:",vb.method, "\nWeighted =", weighted, "\nAt means =", mean, "\nProb x 100 =", times100, "\ntesting =" , testing, "\nuse_bb_and_vbb = ",use_bb_and_vbb, "\nsample size =", length(y1),"\n") if (!resampling) cat("\nSEs by delta method","\n") if (resampling) cat("\nSEs K-R resampling with",ndraws,"draws\n") return(out) } else { invisible(list("ei"=ap,"vi"=vap)) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Maybe change the end of the code to return a bigger list. ll <- list(out = out, ei = ap, vi = vap) return(ll) } else { invisible(list("ei"=ap,"vi"=vap)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a fast way to do my job
Hello, .lm.fit is an order of magnitude faster than lm.fit but the Description section warns on its use, see the examples in help("lm.fit"). Hope this helps, Rui Barradas Às 21:08 de 10/08/2024, Yuan Chun Ding via R-help escreveu: You are right. I also just thought about that, no intercept is not applicable to my case. Ding From: Bert Gunter Sent: Saturday, August 10, 2024 1:06 PM To: Yuan Chun Ding Cc: Ben Bolker ; r-help@r-project.org Subject: Re: [R] a fast way to do my job Ah, messages crossed. A no-intercept model **assumes** the straight line fit must pass through the origin. Unless there is a strong justification for such an assumption, you should include an intercept. -- Bert On Sat, Aug 10, 2024 at 1: 02 PM Ah, messages crossed. A no-intercept model **assumes** the straight line fit must pass through the origin. Unless there is a strong justification for such an assumption, you should include an intercept. -- Bert On Sat, Aug 10, 2024 at 1:02 PM Bert Gunter mailto:bgunter.4...@gmail.com>> wrote: Is it because I failed to to add a column of ones for an intercept to the x matrix? TRhat would be my bad. -- Bert On Sat, Aug 10, 2024 at 12:59 PM Bert Gunter mailto:bgunter.4...@gmail.com>> wrote: Probably because you inadvertently ran different models. Without your code, I haven't a clue. On Sat, Aug 10, 2024, 12:29 Yuan Chun Ding mailto:ycd...@coh.org>> wrote: HI Bert and Ben, Yes, running lm.fit using the matrix format is much faster. I read a couple of online comments why it is faster. However, the residual values for three tested variables or genes from lm function and lm.fit function are different, with Pearson correlation of 0.55, 0.89, and 0.99. I have not found the reason. Thanks, Ding From: Bert Gunter mailto:bgunter.4...@gmail.com>> Sent: Friday, August 9, 2024 7:11 PM To: Ben Bolker mailto:bbol...@gmail.com>> Cc: Yuan Chun Ding mailto:ycd...@coh.org>>; r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] a fast way to do my job Better idea, Ben! It would work as you might expect it to to produce the same results as the above: ##first make sure your regressor is a matrix: pur2 <- matrix(purity2, ncol =1) ## convert the data frame variables into a matrix dat <- Better idea, Ben! It would work as you might expect it to to produce the same results as the above: ##first make sure your regressor is a matrix: pur2 <- matrix(purity2, ncol =1) ## convert the data frame variables into a matrix dat <- as.matrix(gem751be.rpkm[ , 74:35164]) ##then result <- residuals(lm.fit( x= pur2, y = dat)) Cheers, Bert On Fri, Aug 9, 2024 at 6:38 PM Ben Bolker mailto:bbol...@gmail.com>> wrote: You can also fit a linear model with a matrix-valued response variable, which should be even faster (not sure off the top of my head how to get the residuals and reshape them to the dimensions you want) On Fri, Aug 9, 2024 at 9:31 PM Bert Gunter mailto:bgunter.4...@gmail.com>> wrote: See ?lm.fit. I must be missing something, because: results <- sapply(74:35164, \(i) residuals(lm.fit(purity2, gem751be.rpkm[, i] ))) would give you a 751 x 35091 matrix of the residuals from each of the regressions. I assume it will be considerably faster than all the overhead you are carrying in your current code, but of course you'll have to try it and see. ... Assuming that I have interpreted your request correctly. Ignore if not. Cheers, Bert On Fri, Aug 9, 2024 at 4:50 PM Yuan Chun Ding via R-help mailto:r-help@r-project.org>> wrote: Dear R users, I am running the following code below, the gem751be.rpkm is a dataframe with dim of 751 samples by 35164 variables, 73 phenotypic variables in the furst to 73rd column and 35091 genomic variables or genes in the 74th to 35164th columns. What I need to do is to calculate the residuals for each gene using the simple linear regression model of genelist[i] ~ purity2; The following code is running, it takes long time, but I have an expensive ThinkStation window computer. Can you provide a fast way to do it? Thank you, Ding - gem751be.rpkm <-merge(gem751be10, as.data.frame(t(rna849.fpkm2)), + by.x="id2",by.y=0) row.names(gem751be.rp
Re: [R] If loop
Às 05:33 de 09/08/2024, Steven Yen escreveu: The following (using if else) did not help. Seemed like joint12 always kicked in. me1<-me0<-NULL. if(joint12){ {me1<-cbind(me1,v1$p12); me0<-cbind(me0,v0$p12)} } else if(marg1) { {me1<-cbind(me1,v1$p1); me0<-cbind(me0,v0$p1)} } else if(marg2) { {me1<-cbind(me1,v1$p2); me0<-cbind(me0,v0$p2)} } else if(cond12){ {me1<-cbind(me1,v1$pc12); me0<-cbind(me0,v0$pc12)} } else { {me1<-cbind(me1,v1$pc21); me0<-cbind(me0,v0$pc21)} } ... labels<-NULL if(joint12){ labels<-c(labels,lab.p12) } else if(marg1) { labels<-c(labels,lab.p1) } else if(marg2) { labels<-c(labels,lab.p2) } else if(cond12){ labels<-c(labels,lab.pc12) } else { labels<-c(labels,lab.pc21) } On 8/9/2024 11:44 AM, Steven Yen wrote: Can someone help me with the if loop below? In the subroutine, I initialize all of (joint12,marg1,marg2,cond12,cond21) as FALSE, and call with only one of them being TRUE: ,...,joint12=FALSE,marg1=FALSE,marg2=FALSE,cond12=FALSE,cond21=FALSE joint12 seems to always kick in, even though I call with, e.g., marg1 being TRUE and everything else being FALSE. My attempts with if... else if were not useful. Please help. Thanks. v1<-cprob(z1,x1,a,b,mu1,mu2,rho,j+1,k+1) v0<-cprob(z0,x0,a,b,mu1,mu2,rho,j+1,k+1) ... me1<-me0<-NULL if(joint12) {me1<-cbind(me1,v1$p12); me0<-cbind(me0,v0$p12)} if(marg1) {me1<-cbind(me1,v1$p1); me0<-cbind(me0,v0$p1)} if(marg2) {me1<-cbind(me1,v1$p2); me0<-cbind(me0,v0$p2)} if(cond12) {me1<-cbind(me1,v1$pc12); me0<-cbind(me0,v0$pc12)} if(cond21) {me1<-cbind(me1,v1$pc21); me0<-cbind(me0,v0$pc21)} ... labels<-NULL if(joint12) labels<-c(labels,lab.p12) if(marg1) labels<-c(labels,lab.p1) if(marg2) labels<-c(labels,lab.p2) if(cond12) labels<-c(labels,lab.pc12) if(cond21) labels<-c(labels,lab.pc21) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, What you are saying is hardly (not) possible. If you ever call that code with joint12 set to TRUE, do you reset to FALSE afterwards? Can you give a small working example with code and data showing this behavior? Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep
;). To match a literal period you must escape it. The correct regex is '\\.r'. x <- c("age", "sleep", "primary", "middle", "high", "somewhath", "veryh", "somewhatm", "verym", "somewhatc", "veryc", "somewhatl", "veryl", "village", "married", "social", "agricultural", "communist", "minority", "religious") colnms <- c("depression", "sleep", "female", "village", "agricultural", "married", "communist", "minority", "religious", "social", "no", "primary", "middle", "high", "veryh", "somewhath", "notveryh", "verym", "somewhatm", "notverym", "veryc", "somewhatc", "notveryc", "veryl", "somewhatl", "notveryl", "age", "village.r", "married.r", "social.r", "agricultural.r", "communist.r", "minority.r", "religious.r", "male.r", "education.r") grep("\\.r\\b", colnms, value = TRUE) #> [1] "village.r" "married.r" "social.r" "agricultural.r" #> [5] "communist.r""minority.r" "religious.r""male.r" #> [9] "education.r" # the same as above # \\> matches the empty string at the end of a word, # \\b matches the empty string at both ends of a word grep("\\.r\\>", colnms, value = TRUE) #> [1] "village.r" "married.r" "social.r" "agricultural.r" #> [5] "communist.r""minority.r" "religious.r""male.r" #> [9] "education.r" # 4 col names have a 'm' and end in '.r' therefore 4 matches grep("m.*\\.r\\>", colnms, value = TRUE) #> [1] "married.r" "communist.r" "minority.r" "male.r" # only the strings starting with 'm' grep("\\bm.*\\.r\\b", colnms, value = TRUE) #> [1] "married.r" "minority.r" "male.r" grep("\\", colnms, value = TRUE) #> [1] "married.r" "minority.r" "male.r" Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R facets including two kinds of charts
Hello, I hadn't understood the problem, sorry. The problem are the bar plots, ggplot is plotting one in the "A" facet. And since there is nothing to plot, the bars start at 0. A hack is to plot facet "A" separately and then combine the plots with one of several ways to combine ggplot plots. Below is an example with cowplot::plot_grid library(ggplot2) library(dplyr) library(cowplot) p1 <- df %>% filter(nm == "A") %>% ggplot(aes(x = date)) + geom_line(aes(y = val2)) + facet_wrap(~ nm, scales = "free_y") + theme(plot.margin = unit(c(0.2, 0, 0.1, 0), "cm")) p2 <- df %>% filter(nm != "A") %>% ggplot(aes(x = date)) + geom_col(aes(y = val0), na.rm = TRUE, fill = "white") + geom_line(aes(y = val1)) + ylab("") + facet_wrap(~ nm, scales = "free_y") plot_grid(p1, p2, rel_widths = c(1, 2)) Hope this helps, Rui Barradas Às 20:10 de 01/08/2024, p...@philipsmith.ca escreveu: Thanks for the suggestion, but this does not give me what I want. Each chart needs its own unique scale on the y-axis. Philip On 2024-08-01 15:08, Rui Barradas wrote: Às 19:01 de 01/08/2024, p...@philipsmith.ca escreveu: I am asking for help with a ggplot2 program that has facets. There are actually 100 facets in my program, but in the example below I have limited the number to 3. There are two kinds of charts among the facets. One kind is a simple line plot with all of the y-values greater than zero. The facet for "A" in my example below is this kind. The other kind is a line plot combined with a bar chart with some of the y-values being positive and others negative. The facets for "B" and "C" in my example are this kind. The facets for "B" and "C" look the way I want them to. However the facet for "A" has a scale on the y-axis that starts at zero, whereas I would like the minimum value on this scale to be non-zero, chosen by ggplot2 to be closer to the minimum value of y for that particular facet. My example may not be the most efficient way to achieve this, but it works except for one aspect. Chart A, for which I do not wish to show a zero line, does indeed not show a zero line but it nevertheless chooses a scale for the y-axis that has a minimum value of zero. How can I adjust the code so that it chooses a minimum value on the y-axis that is non-zero and closer to the minimum actual y-value (as would be the case for a simple line chart alone, without any facets)? library(ggplot2) library(dplyr) df <- data.frame( date=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6), nm=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"), val0=c(NA,-5,4,NA,-3,3,NA,2,4,NA,3,3,NA,3,1,NA,-3,-4), val1=c(NA,-3,6,NA,-1,4,NA,5,5,NA,7,2,NA,4,3,NA,-2,-2), val2=c(50,NA,NA,53,NA,NA,62,NA,NA,56,NA,NA,54,NA,NA,61,NA,NA), zline=c(NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0) ) ggplot(df)+ geom_col(aes(x=date,y=val0),na.rm=TRUE,fill="white")+ geom_line(aes(x=date,y=val1))+ geom_line(aes(x=date,y=val2))+ geom_hline(aes(yintercept=zline),na.rm=TRUE)+ facet_wrap(~nm,scales="free_y") Thank you for your assistance. Philip __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Try to remove scales="free_y" from facet_wrap(). With scales="free_y" each facet will have its own y limits, given by the data plotted in each of them. If you want a global y limits, don't use it. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R facets including two kinds of charts
Às 19:01 de 01/08/2024, p...@philipsmith.ca escreveu: I am asking for help with a ggplot2 program that has facets. There are actually 100 facets in my program, but in the example below I have limited the number to 3. There are two kinds of charts among the facets. One kind is a simple line plot with all of the y-values greater than zero. The facet for "A" in my example below is this kind. The other kind is a line plot combined with a bar chart with some of the y-values being positive and others negative. The facets for "B" and "C" in my example are this kind. The facets for "B" and "C" look the way I want them to. However the facet for "A" has a scale on the y-axis that starts at zero, whereas I would like the minimum value on this scale to be non-zero, chosen by ggplot2 to be closer to the minimum value of y for that particular facet. My example may not be the most efficient way to achieve this, but it works except for one aspect. Chart A, for which I do not wish to show a zero line, does indeed not show a zero line but it nevertheless chooses a scale for the y-axis that has a minimum value of zero. How can I adjust the code so that it chooses a minimum value on the y-axis that is non-zero and closer to the minimum actual y-value (as would be the case for a simple line chart alone, without any facets)? library(ggplot2) library(dplyr) df <- data.frame( date=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6), nm=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"), val0=c(NA,-5,4,NA,-3,3,NA,2,4,NA,3,3,NA,3,1,NA,-3,-4), val1=c(NA,-3,6,NA,-1,4,NA,5,5,NA,7,2,NA,4,3,NA,-2,-2), val2=c(50,NA,NA,53,NA,NA,62,NA,NA,56,NA,NA,54,NA,NA,61,NA,NA), zline=c(NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0,NA,0,0) ) ggplot(df)+ geom_col(aes(x=date,y=val0),na.rm=TRUE,fill="white")+ geom_line(aes(x=date,y=val1))+ geom_line(aes(x=date,y=val2))+ geom_hline(aes(yintercept=zline),na.rm=TRUE)+ facet_wrap(~nm,scales="free_y") Thank you for your assistance. Philip __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Try to remove scales="free_y" from facet_wrap(). With scales="free_y" each facet will have its own y limits, given by the data plotted in each of them. If you want a global y limits, don't use it. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help on date objects...
Às 05:23 de 28/07/2024, akshay kulkarni escreveu: Dear members, WHy is the following code returning NA instead of the date? as.Date("2022-01-02", origin = "1900-01-01", format = "%y%d%m") [1] NA Thanking you, Yours sincerely, AKSHAY M KULKARNI [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, There are several reasons for your result. 1. You have 4 digits year but format %y (lower case = 2 digits year) It should be %Y 2. Your date has '-' as separator but your format doesn't have a separator. Also, though less important: 1. You don't need argument origin. This is only needed with numeric to date coercion. 2. Are you sure the format is -DD-MM, year-day-month? as.Date("2022-01-02", format = "%Y-%d-%m") #> [1] "2022-02-01" # note the origin is not your posted origin date, # see the examples on Windows and Excel # dates in help("as.Date") as.Date(19024, origin = "1970-01-01") #> [1] "2022-02-01" Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please help generate a square correlation matrix
Às 20:47 de 25/07/2024, Yuan Chun Ding escreveu: Hi Rui, You are always very helpful!! Thank you, I just modified your R codes to remove a row with zero values in both column pair as below for my real data. Ding dat<-gene22mut.coded r <- P <- matrix(NA, nrow = 22L, ncol = 22L, dimnames = list(names(dat), names(dat))) for(i in 1:22) { #i=1 x <- dat[[i]] for(j in (1:22)) { #j=2 if(i == j) { # there's nothing to test, assign correlation 1 r[i, j] <- 1 } else { tmp <-cbind(x,dat[[j]]) row0 <-rowSums(tmp) tem2 <-tmp[row0!=0,] tmp3 <- cor.test(tem2[,1],tem2[,2]) r[i, j] <- tmp3$estimate P[i, j] <- tmp3$p.value } } } r<-as.data.frame(r) P<-as.data.frame(P) From: R-help On Behalf Of Yuan Chun Ding via R-help Sent: Thursday, July 25, 2024 11:26 AM To: Rui Barradas ; r-help@r-project.org Subject: Re: [R] please help generate a square correlation matrix HI Rui, Thank you for the help! You did not remove a row if zero values exist in both column pair, right? Ding From: Rui Barradas Sent: Thursday, July 25, 2024 11: 15 AM To: Yuan Chun Ding ; HI Rui, Thank you for the help! You did not remove a row if zero values exist in both column pair, right? Ding From: Rui Barradas mailto:ruipbarra...@sapo.pt>> Sent: Thursday, July 25, 2024 11:15 AM To: Yuan Chun Ding mailto:ycd...@coh.org>>; r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] please help generate a square correlation matrix Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R users, > > I generated a square correlation matrix for the dat dataframe below; > dat<-data. frame(g1=c(1,0,0,1,1,1,0,0,0), > g2=c(0,1,0,1,0,1,1,0,0), > g3=c(1,1,0,0,0,1,0,0,0), Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: Hi R users, I generated a square correlation matrix for the dat dataframe below; dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0), g2=c(0,1,0,1,0,1,1,0,0), g3=c(1,1,0,0,0,1,0,0,0), g4=c(0,1,0,1,1,1,1,1,0)) library("Hmisc") dat.rcorr = rcorr(as.matrix(dat)) dat.r <-round(dat.rcorr$r,2) however, I want to modify this correlation calculation; my dat has more than 1000 rows and 22 columns; in each column, less than 10% values are 1, most of them are 0; so I want to remove a row with value of zero in both columns when calculate correlation between two columns. I just want to check whether those values of 1 are correlated between two columns. Please look at my code in the following; cor.4gene <-matrix(0,nrow=4*4, ncol=4) for (i in 1:4){ #i=1 for (j in 1:4) { #j=1 d <-dat[,c(i,j)]%>% filter(eval(as.symbol(colnames(dat)[i]))!=0 | eval(as.symbol(colnames(dat)[j]))!=0) c <-cor.test(d[,1],d[,2]) cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j], c$estimate,c$p.value) } } cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0) colnames(cor.4gene)<-c("gene1","gene2","cor","P") Can you tell me what mistakes I made? first, why cor is NA when calculation of correlation for g1 and g1, I though it should be 1. cor.4gene$cor[is.na(cor.4gene$cor)]<-1 cor.4gene$cor[is.na(cor.4gene$P)]<-0 cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor) Then this line of code above did not generate a square matrix as what the HMisc library did. How to fix my code? Thank you, Ding -- -SECURITY/CONFIDENTIALITY WARNING- This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message
Re: [R] please help generate a square correlation matrix
00 NA P #> g1 g2g3 g4 #> g1NA 0.79797170 0.4070838 0.68452834 #> g2 0.7979717 NA 0.4070838 0.06758329 #> g3 0.4070838 0.40708382NA 1.0000 #> g4 0.6845283 0.06758329 1.000 NA You can put these two results in a list, like Hmisc::rcorr does. lst_rcorr <- list(r = r, P = P) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using the pipe, |>, syntax with "names<-"
Às 21:46 de 20/07/2024, Bert Gunter escreveu: With further fooling around, I realized that explicitly assigning my last "solution" 'works'; i.e. names(z)[2] <- "foo" can be piped as: z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))() z a foo 1 1 a 2 2 b 3 3 c This is even awfuller than before. So my query still stands. -- Bert On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter wrote: Nope, I still got it wrong: None of my approaches work. :( So my query remains: how to do it via piping with |> ? Bert On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter wrote: This post is likely pretty useless; it is motivated by a recent post from "Val" that was elegantly answered using Tidyverse constructs, but I wondered how to do it using base R only. Along the way, I ran into the following question to which I think my answer (below) is pretty awful. I would be interested in more elegant base R approaches. So... z <- data.frame(a = 1:3, b = letters[1:3]) z a h 1 1 a 2 2 b 3 3 c Suppose I want to change the name of the second column of z from 'b' to 'foo' . This is very easy using nested function syntax by: names(z)[2] <- "foo" z a foo 1 1 a 2 2 b 3 3 c Now suppose I wanted to do this using |> syntax, along the lines of: z |> names()[2] <- "foo" ## throws an error Slightly fancier is: z |> (\(x)names(x)[2] <- "b")() ## does nothing, but does not throw an error. However, the following, which resulted from a more careful read of ?names works (after changing the name of the second column back to "b" of course): z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))() z a foo 1 1 a 2 2 b 3 3 c This qualifies to me as "pretty awful." I'm sure there are better ways to do this using pipe syntax, so I would appreciate any better approaches. Best, Bert __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, This is not exactly the same but in one of your attempts all you have to do is to return x. The following works and does something. z |> (\(x){names(x)[2] <- "foo";x})() # a foo # 1 1 a # 2 2 b # 3 3 c Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot two-factor legend
Às 17:43 de 18/07/2024, Rui Barradas escreveu: Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu: Hi I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 1) x = 6 years. I was able to adapt the colour of the lines (green and red) and the linetype (solid and dashed). Challenge: my code produces now two legends. One with the colors for the group and one with the linetype for the group. Does somebody have a hint how to adapt the code to produce one legend? Group 0 = red and dashed, Group 1 = green and solid? MS1<- MS %>% filter(QI_A!="NA") %>% droplevels() dev.new(width=4, height=2.75) par(mar = c(0,6,0,0)) p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio, linetype=Bio)) + geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x + I(x^2),linewidth=1) + theme(panel.background = element_blank())+ theme(axis.line = element_line(colour = "black"))+ theme(axis.text=element_text(size=18))+ theme(axis.title=element_text(size=20))+ ylab("Anteil BFF an LN [%]") +xlab("Jahr")+ scale_color_manual(values=c("red","dark green"), labels=c("ÖLN", "BIO"))+ scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN", "BIO"))+ theme(legend.title = element_blank())+ theme(legend.text=element_text(size=20))+ scale_linetype_manual(values=c("dashed", "solid")) p1<-p1 + expand_limits(y=c(0, 30)) kind regards Sibylle __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To have one legend only, the labels must be the same. Try using labels=c("ÖLN", "BIO") in scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", "BIO")) Hope this helps, Rui Barradas Hello, Here is a more complete an answer with the built-in data set mtcars. Note that the group aesthetic is not used. This is because linetype is categorical (after mutate) and there's no need to group again by the same variable (am). Remove labels from scale_linetype_manual and there are two legends but with the same labels the legends merge. library(ggplot2) library(dplyr) mtcars %>% # linetype must be categorical mutate(am = factor(am)) %>% ggplot(aes(hp, disp, color = am, linetype = am)) + geom_line() + scale_color_manual( values = c("red","dark green"), labels = c("ÖLN", "BIO") ) + scale_linetype_manual( values = c("dashed", "solid"), labels = c("ÖLN", "BIO") ) + theme_bw() Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot two-factor legend
Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu: Hi I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 1) x = 6 years. I was able to adapt the colour of the lines (green and red) and the linetype (solid and dashed). Challenge: my code produces now two legends. One with the colors for the group and one with the linetype for the group. Does somebody have a hint how to adapt the code to produce one legend? Group 0 = red and dashed, Group 1 = green and solid? MS1<- MS %>% filter(QI_A!="NA") %>% droplevels() dev.new(width=4, height=2.75) par(mar = c(0,6,0,0)) p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio, linetype=Bio)) + geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x + I(x^2),linewidth=1) + theme(panel.background = element_blank())+ theme(axis.line = element_line(colour = "black"))+ theme(axis.text=element_text(size=18))+ theme(axis.title=element_text(size=20))+ ylab("Anteil BFF an LN [%]") +xlab("Jahr")+ scale_color_manual(values=c("red","dark green"), labels=c("ÖLN", "BIO"))+ scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN", "BIO"))+ theme(legend.title = element_blank())+ theme(legend.text=element_text(size=20))+ scale_linetype_manual(values=c("dashed", "solid")) p1<-p1 + expand_limits(y=c(0, 30)) kind regards Sibylle __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To have one legend only, the labels must be the same. Try using labels=c("ÖLN", "BIO") in scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", "BIO")) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Obtaining predicted probabilities for Logistic regression
Às 12:13 de 13/07/2024, Christofer Bogaso escreveu: Hi, I ran below code Dat = read.csv('https://raw.githubusercontent.com/sam16tyagi/Machine-Learning-techniques-in-python/master/logistic%20regression%20dataset-Social_Network_Ads.csv') head(Dat) Model = glm(Purchased ~ Gender, data = Dat, family = binomial()) head(predict(Model, type="response")) My_Predict = 1/(1+exp(-1 * (as.vector(coef(Model))[1] * as.vector(coef(Model))[2] * ifelse(Dat['Gender'] == "Male", 1, 0 head(My_Predict) However, My_Predict and predict(Model, type="response")) are differing when I tried to manually calculate prediction. Could you please help to identify what was the mistake I made? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Sometimes when there is an error, the best way to correct it is to rewrite the offending part of the code. In your case, after as.vector(coef(Model))[1] you should have a plus sign. Dat = read.csv('https://raw.githubusercontent.com/sam16tyagi/Machine-Learning-techniques-in-python/master/logistic%20regression%20dataset-Social_Network_Ads.csv') head(Dat) Model = glm(Purchased ~ Gender, data = Dat, family = binomial()) # use matrix algebra x <- cbind(1, (Dat$Gender == "Male")) %*% coef(Model) pred1 <- exp(x)/(1 + exp(x)) # use the fitted line equation y <- coef(Model)[1L] + coef(Model)[2L] * (Dat$Gender == "Male") pred2 <- exp(y)/(1 + exp(y)) head(predict(Model, type="response")) head(pred1) |> c() head(pred2) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep
Hello,l Though the question is already answered, here is another answer to what is 'x'. The output in the OP is not a lm or glm output but if your regression model was programmed according to recommended practices, there must be a 'coefficients' member in the list or object it returns and the following should work. # this is 'x', a named character vector coef(fit) # fit |> coef() |> names() |> grep("somewhat|very", x = _) Hope this helps, Rui Barradas Às 10:26 de 12/07/2024, Steven Yen escreveu: Thanks. In this case below, what is "x"? I tried rownames(out) which did not work. Sorry. Does this sound like homework to you? On 7/12/2024 5:09 PM, Uwe Ligges wrote: On 12.07.2024 10:54, Steven Yen wrote: Below is part a regression printout. How can I use "grep" to identify rows headed by variables (first column) with a certain label. In this case, I like to find variables containing "somewhath", "veryh", "somewhatm", "verym", "somewhatc", "veryc","somewhatl", "veryl". The result should be an index 6:13 or 6,7,8,9,10,11,12,13. Note that they all contain "somewhat" and "very". Thanks. Sounds like homework? which(grep("very|somewhat", x)) Best, Uwe Ligges est se t p g sig x.1.age 0.0341 0.0138 2.4766 0.0133 -3.8835e-04 ** x.1.sleep -0.1108 0.0059 -18.6277 0. -4.4572e-04 *** x.1.primary -0.0694 0.0289 -2.4002 0.0164 -9.9638e-06 ** x.1.middle -0.2909 0.0356 -8.1657 0. -1.4913e-05 *** x.1.high -0.4267 0.0463 -9.2118 0. -3.6246e-05 *** x.1.somewhath -0.6188 0.0256 -24.1971 0. -3.1337e-05 *** x.1.veryh -0.7580 0.0331 -22.8695 0. -2.9558e-05 *** x.1.somewhatm -0.3413 0.0426 -8.0112 0. -1.8920e-05 *** x.1.verym -0.3813 0.0446 -8.5413 0. -4.4029e-05 *** x.1.somewhatc -0.3101 0.0649 -4.7783 0. -1.4353e-05 *** x.1.veryc -0.2977 0.0648 -4.5910 0. -4.8986e-05 *** x.1.somewhatl -0.6310 0.0424 -14.8846 0. -1.9543e-05 *** x.1.veryl -0.9132 0.0462 -19.7525 0. -4.4603e-05 *** ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple problem with unquoting argument
Às 09:13 de 03/07/2024, Troels Ring escreveu: Hi friends - I'm in problems finding out how to unquote - I have a series of vectors named adds1adds11 and need to e.g. find the sum of each of them So I try SS <- c() for (i in 1:11) { e <- paste("adds",i,sep="") SS[i] <- sum(xx(e)) } Now e looks right - but I have been unable to find out how to get the string e converted to the proper argument for sum() - i.e. what is function xx? All best wishes Troels Ring, Aalborg, Denmark __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Function xx is ?get or mget (same help page). You can get the vectors adds all in one instruction with mget or one at a time with get. adds1 <- 1:10 adds2 <- 2:10 adds3 <- 3:10 adds4 <- 4:10 adds5 <- 5:10 # create SS with the required length beforehand SS <- numeric(5L) for (i in 1:5) { e <- paste("adds",i,sep="") SS[i] <- sum(get(e)) } SS #> [1] 55 54 52 49 45 Or all in one instruction with the assistance of ?ls. # ls(pattern = "^adds") |> mget() |> lapply(sum) ls(pattern = "^adds") |> mget() |> sapply(sum) #> adds1 adds2 adds3 adds4 adds5 #>5554524945 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create matrix with variable number of columns AND CREATE NAMES FOR THE COLUMNS
Às 16:54 de 01/07/2024, Sorkin, John escreveu: #I am trying to write code that will create a matrix with a variable number of columns where the #number of columns is 1+Grps #I can do this: NSims <- 4 Grps <- 5 DiffMeans <- matrix(nrow=NSims,ncol=1+Grps) DiffMeans #I have a problem when I try to name the columns of the matrix. I want the first column to be NSims, #and the other columns to be something like Value1, Value2, . . . Valuen where N=Grps # I wrote a function to build a list of length Grps createValuelist <- function(num_elements) { for (i in 1:num_elements) { cat("Item", i, "\n", sep = "") } } createValuelist(Grps) # When I try to assign column names I receive an error: #Error in dimnames(DiffMeans) <- list(NULL, c("NSim", createValuelist(Grps))) : # length of 'dimnames' [2] not equal to array extent dimnames(DiffMeans) <- list(NULL,c("NSim",createValuelist(Grps))) DiffMeans # Thank you for your help! John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Sorry for my first answer, I thought you only wanted to name the matrix columns. After reading the OP again, this time actually reading it, I realized you also want to create the matrix. This is even in the question title line :(. create_matrix <- function(nsims, ngrps, First = "NSims", Prefix = "Value") { # could also be paste0(Prefix, seq_len(ngrps)) grp_names <- sprintf("%s%d", Prefix, seq_len(ngrps)) nms <- c(First, grp_names) matrix(nrow = nsims, ncol = 1L + ngrps, dimnames = list(NULL, nms)) } NSims <- 4 Grps <- 5 create_matrix(NSims, Grps) #> NSims Value1 Value2 Value3 Value4 Value5 #> [1,]NA NA NA NA NA NA #> [2,]NA NA NA NA NA NA #> [3,]NA NA NA NA NA NA #> [4,]NA NA NA NA NA NA Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create matrix with variable number of columns AND CREATE NAMES FOR THE COLUMNS
Às 16:54 de 01/07/2024, Sorkin, John escreveu: #I am trying to write code that will create a matrix with a variable number of columns where the #number of columns is 1+Grps #I can do this: NSims <- 4 Grps <- 5 DiffMeans <- matrix(nrow=NSims,ncol=1+Grps) DiffMeans #I have a problem when I try to name the columns of the matrix. I want the first column to be NSims, #and the other columns to be something like Value1, Value2, . . . Valuen where N=Grps # I wrote a function to build a list of length Grps createValuelist <- function(num_elements) { for (i in 1:num_elements) { cat("Item", i, "\n", sep = "") } } createValuelist(Grps) # When I try to assign column names I receive an error: #Error in dimnames(DiffMeans) <- list(NULL, c("NSim", createValuelist(Grps))) : # length of 'dimnames' [2] not equal to array extent dimnames(DiffMeans) <- list(NULL,c("NSim",createValuelist(Grps))) DiffMeans # Thank you for your help! John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Something like this? names_cols <- function(x, First = "NSims", Prefix = "Value") { nms <- c(First, sprintf("%s%d", Prefix, seq_len(ncol(x) - 1L))) colnames(x) <- nms x } NSims <- 4 Grps <- 5 DiffMeans <- matrix(nrow=NSims,ncol=1+Grps) names_cols(DiffMeans) #> NSims Value1 Value2 Value3 Value4 Value5 #> [1,]NA NA NA NA NA NA #> [2,]NA NA NA NA NA NA #> [3,]NA NA NA NA NA NA #> [4,]NA NA NA NA NA NA Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Naming output file
Às 12:41 de 24/06/2024, Steven Yen escreveu: I would like a loop to (1) read data files 2010midata1,2010midata2,2010midata3; and (2) name OUTPUT bop1,bop2,bop3. I succeeded in line 3 of the code below, BUT not line 4. The error message says: Error in paste0("bop", im) <- boprobit(eqs, mydata, wt = weight, method = "NR", : target of assignment expands to non-language object Please help. Thanks. m<-3 for (im in 1:m) { mydata<-read.csv(paste0("2010midata",im,".csv")) paste0("bop",im)<-boprobit(eqs,mydata,wt=weight,method="BHHH",tol=0,reltol=0,gradtol=1e-5,Fisher=TRUE) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here are two ways, with a for loop and with a lapply loop. # for loop m <- 3 # create the input filenames in one instruction INPUT <- paste0("2010midata", seq.int(m), ".csv") # create a named list with m elements to store the output OUTPUT <- vector("list", length = m) |> setNames(paste0("bop", seq.int(m))) for(i in seq.int(m)) { mydata <- read.csv(INPUT[[i]]) OUTPUT[[i]] <- boprobit(eqs, mydata, wt=weight, method="BHHH", tol=0, reltol=0, gradtol=1e-5, Fisher=TRUE) } # lapply loop m <- 3 # create the input filenames in one instruction INPUT <- paste0("2010midata", seq.int(m), ".csv") # no need to create the output list, it will be the # return value of lapply OUTPUT <- lapply(INPUT, \(f) { mydata <- read.csv(f) boprobit(eqs, mydata, wt=weight, method="BHHH", tol=0, reltol=0, gradtol=1e-5, Fisher=TRUE) }) # assign the output list's names names(OUTPUT) <- paste0("bop", seq.int(m)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug with writeClipboard in {utils}
Hello, Inline. Às 14:15 de 20/06/2024, Barthelemy Tanguy escreveu: Hello, Thank you for your different tests. You have that you didn't find any errors with Rscript or with R but I have the impression that your test with R (second test) showed additional and unwanted characters (second line of the output)? You are right, in the case I posted there were unwanted characters. Most of the tests I ran there were no additional, unwanted charcters, though. This is definitely unstable, that's all I can say. Hope this helps, Rui Barradas Thank you again Tanguy BARTHELEMY ____ De : Rui Barradas Envoyé : mercredi 19 juin 2024 19:26 À : Barthelemy Tanguy; r-help@r-project.org Objet : Re: [R] Bug with writeClipboard in {utils} « Ce courriel provient d’un expéditeur extérieur à l’Insee. Compte tenu du contexte de menace cyber actuel il convient d’être extrêmement vigilant sur l’émetteur et son contenu avant d’ouvrir une pièce jointe, de cliquer sur un lien internet présent dans ce message ou d'y répondre. » Às 11:12 de 18/06/2024, Barthelemy Tanguy via R-help escreveu: Hello, I'm encountering what seems to be a bug when using the `writeClipboard()` function in the R {utils} package. When I try to copy text to the clipboard, I notice that I get extra characters when I try to paste it (by hand with CTRL+V or with the `readClipboard()` function from R packages {utils}). Here's my example: ``` r utils::writeClipboard("plot(AirPassengers)") for (k in 1:10) { print(utils::readClipboard()) } #> [1] "plot(AirPassengers)" "⤀攀" #> [1] "plot(AirPassengers)" "\u0a00" #> [1] "plot(AirPassengers)" "\xed\xb0\x80ư" #> [1] "plot(AirPassengers)" #> [1] "plot(AirPassengers)" #> [1] "plot(AirPassengers)" #> [1] "plot(AirPassengers)" #> [1] "plot(AirPassengers)" #> [1] "plot(AirPassengers)" "⤀" #> [1] "plot(AirPassengers)" Message d'avis : Dans utils::readClipboard() : unpaired surrogate Unicode point dc00 ``` So I don't always get the same result. I opened a problem in the {clipr} GitHub repository before realizing it's a {tools} problem: https://github.com/mdlincoln/clipr/issues/68 Is this a bug or something I haven't configured properly? Thank you very much Tanguy BARTHELEMY [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I have reproduced part of the behavior in the OP but it will depend on the GUI or command line used. With Rscript or with R I haven't found any errors. With Rgui or with RStudio, yes, the output was not the expected output. All code run in R 4.4.0 on Windows 11. The script rscript.R is utils::capture.output({ utils::writeClipboard("plot(AirPassengers)") for (k in 1:10) { print(utils::readClipboard()) } sessionInfo() }, file = "rhelp.txt") --- Here are the results I got. 1) Command: Rscript rscript.R Output: [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 11 x64 (build 22631) Matrix products: default locale: [1] LC_COLLATE=Portuguese_Portugal.utf8 LC_CTYPE=Portuguese_Portugal.utf8 [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.utf8 time zone: Europe/Lisbon tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.4.0 --- 2) Command: R -q -f rscript.R Output: > utils::writeClipboard("plot(AirPassengers)") > for (k in 1:10) { + print(utils::readClipboard()) + } [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" "㨀Ǐ\005" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" > sessionInfo() R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 11 x64 (build 22631) Matrix products: default locale:
Re: [R] Bug with writeClipboard in {utils}
8 LC_CTYPE=Portuguese_Portugal.utf8 [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.utf8 time zone: Europe/Lisbon tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] gtable_0.3.4 tensorA_0.36.2.1 ggplot2_3.5.0 [4] QuickJSR_1.1.3 processx_3.8.3 inline_0.3.19 [7] lattice_0.22-5 tzdb_0.4.0 callr_3.7.5 [10] vctrs_0.6.5 tools_4.4.0 ps_1.7.6 [13] generics_0.1.3 stats4_4.4.0 curl_5.2.1 [16] parallel_4.4.0 sandwich_3.1-0 tibble_3.2.1 [19] fansi_1.0.6 chron_2.3-61 pkgconfig_2.0.3 [22] brms_2.21.0 Matrix_1.6-5 checkmate_2.3.1 [25] distributional_0.4.0 RcppParallel_5.1.7 lifecycle_1.0.4 [28] compiler_4.4.0 stringr_1.5.1Brobdingnag_1.2-9 [31] munsell_0.5.0codetools_0.2-19 bayesplot_1.11.1 [34] pillar_1.9.0 crayon_1.5.2 MASS_7.3-60.0.1 [37] StanHeaders_2.32.6 bridgesampling_1.1-2 abind_1.4-5 [40] multcomp_1.4-25 nlme_3.1-164 posterior_1.5.0 [43] rstan_2.32.5 tidyselect_1.2.0 mvtnorm_1.2-3 [46] stringi_1.7.12 dplyr_1.1.4 splines_4.4.0 [49] grid_4.4.0 colorspace_2.1-0 cli_3.6.2 [52] magrittr_2.0.3 loo_2.6.0survival_3.5-8 [55] pkgbuild_1.4.2 utf8_1.2.4 TH.data_1.1-2 [58] readr_2.1.4 prettyunits_1.2.0scales_1.3.0 [61] backports_1.4.1 estimability_1.5 httr_1.4.7 [64] matrixStats_1.0.0emmeans_1.10.0 gridExtra_2.3 [67] hms_1.1.3zoo_1.8-12 coda_0.19-4.1 [70] V8_4.4.2 rstantools_2.3.1.1 rlang_1.1.3 [73] Rcpp_1.0.12 xtable_1.8-4 glue_1.7.0 [76] ppcor_1.1rstudioapi_0.15.0jsonlite_1.8.8 [79] R6_2.5.1 --- 4) GUI: Rgui Output: [1] "plot(AirPassengers)" "က \005ⷀǏǭ" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" [1] "plot(AirPassengers)" R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 11 x64 (build 22631) Matrix products: default locale: [1] LC_COLLATE=Portuguese_Portugal.utf8 LC_CTYPE=Portuguese_Portugal.utf8 [3] LC_MONETARY=Portuguese_Portugal.utf8 LC_NUMERIC=C [5] LC_TIME=Portuguese_Portugal.utf8 time zone: Europe/Lisbon tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.4.0 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] code for year month day hr format
-07-11 9 6.2 523 110 -34 167.1 4619 2012-07-11 10 5.5 527 110 -25 167.1 4620 2012-07-11 11 6.0 527 110 -22 167.1 4621 2012-07-11 12 5.8 518 110 -22 167.1 4622 2012-07-11 13 5.4 515 110 -19 167.1 4623 2012-07-11 14 5.3 513 110 -21 167.1 4624 2012-07-11 15 5.5 512 110 -21 167.1 4625 2012-07-11 16 5.2 505 110 -21 167.1 4626 2012-07-11 17 4.9 512 110 -18 167.1 4627 2012-07-11 18 5.1 514 110 -17 167.1 4628 2012-07-11 19 6.2 520 110 -13 167.1 4629 2012-07-11 20 6.6 510 110 -17 167.1 4630 2012-07-11 21 6.2 516 110 -18 167.1 4631 2012-07-11 22 5.8 512 110 -24 167.1 4632 2012-07-11 23 5.9 509 110 -31 167.1 4633 2012-07-12 0 6.1 502 125 -34 170.9 4634 2012-07-12 1 6.6 506 125 -34 170.9 4635 2012-07-12 2 6.1 502 125 -22 170.9 4636 2012-07-12 3 5.8 480 125 -18 170.9 4637 2012-07-12 4 5.7 474 125 -15 170.9 4638 2012-07-12 5 5.4 474 125 -23 170.9 4639 2012-07-12 6 6.1 466 125 -28 170.9 4640 2012-07-12 7 5.4 460 125 -32 170.9 4641 2012-07-12 8 4.8 453 125 -32 170.9 4642 2012-07-12 9 4.7 445 125 -28 170.9 4643 2012-07-12 10 4.9 436 125 -29 170.9 4644 2012-07-12 11 4.9 441 125 -23 170.9 4645 2012-07-12 12 4.9 440 125 -18 170.9 4646 2012-07-12 13 4.2 417 125 -15 170.9 4647 2012-07-12 14 3.5 414 125 -16 170.9 4648 2012-07-12 15 3.9 418 125 -14 170.9 4649 2012-07-12 16 4.2 419 125 -11 170.9 4650 2012-07-12 17 3.9 416 125 -11 170.9 4651 2012-07-12 18 4.0 416 125 -12 170.9 4652 2012-07-12 19 3.8 415 125 -13 170.9 4653 2012-07-12 20 3.9 410 125 -16 170.9 4654 2012-07-12 21 3.8 402 125 -20 170.9 4655 2012-07-12 22 3.8 395 125 -19 170.9 4656 2012-07-12 23 3.9 394 125 -19 170.9 4657 2012-07-13 0 3.9 395 129 -20 152.1 4658 2012-07-13 1 3.8 395 129 -19 152.1 4659 2012-07-13 2 3.8 391 129 -17 152.1 4660 2012-07-13 3 3.8 385 129 -16 152.1 4661 2012-07-13 4 3.7 376 129 -15 152.1 4662 2012-07-13 5 3.8 371 129 -15 152.1 4663 2012-07-13 6 3.8 365 129 -14 152.1 4664 2012-07-13 7 3.9 357 129 -15 152.1 4665 2012-07-13 8 4.0 354 129 -18 152.1 4666 2012-07-13 9 3.9 355 129 -20 152.1 4667 2012-07-13 10 3.9 353 129 -19 152.1 4668 2012-07-13 11 3.7 357 129 -18 152.1 4669 2012-07-13 12 3.8 357 129 -18 152.1 4670 2012-07-13 13 3.8 355 129 -18 152.1 4671 2012-07-13 14 3.7 347 129 -17 152.1 4672 2012-07-13 15 3.7 350 129 -15 152.1 4673 2012-07-13 16 3.7 346 129 -13 152.1 4674 2012-07-13 17 3.7 341 129 -10 152.1 4675 2012-07-13 18 3.3 340 129 -8 152.1 4676 2012-07-13 19 3.2 338 129 -9 152.1 4677 2012-07-13 20 3.3 333 129 -10 152.1 4678 2012-07-13 21 3.4 329 129 -9 152.1 4679 2012-07-13 22 3.9 326 129 -7 152.1 4680 2012-07-13 23 4.0 324 129 -8 152.1 4681 2012-07-14 0 4.0 324 125 -9 152.8 4682 2012-07-14 1 4.0 325 125 -9 152.8 4683 2012-07-14 2 3.9 329 125 -7 152.8 4684 2012-07-14 3 4.1 326 125 -5 152.8 4685 2012-07-14 4 4.4 325 125 -6 152.8 4686 2012-07-14 5 4.5 323 125 -5 152.8 4687 2012-07-14 6 5.0 319 125 -5 152.8 4688 2012-07-14 7 5.2 317 125 -8 152.8 4689 2012-07-14 8 5.4 323 125 -7 152.8 4690 2012-07-14 9 5.4 318 125 -6 152.8 4691 2012-07-14 10 5.2 316 125 -8 152.8 4692 2012-07-14 11 5.2 326 125 -5 152.8 4693 2012-07-14 12 4.6 335 125 -5 152.8 4694 2012-07-14 13 4.2 340 125 -5 152.8 4695 2012-07-14 14 5.0 350 125 -5 152.8 4696 2012-07-14 15 4.9 366 125 -1 152.8 4697 2012-07-14 16 3.9 355 125 -5 152.8 4698 2012-07-14 17 5.1 369 125 -5 152.8 4699 2012-07-14 18 11.0 419 125 15 152.8 4700 2012-07-14 19 14.6 574 1254 152.8 4701 2012-07-14 20 11.2 569 125 -7 152.8 4702 2012-07-14 21 13.9 568 125 -5 152.8 4703 2012-07-14 22 15.3 574 1251 152.8 4704 2012-07-14 23 19.2 644 125 -2 152.8 4705 2012-07-15 0 11.4 665 1179 145.1 4706 2012-07-15 1 9.7 657 1170 145.1 *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Mon, Jun 17, 2024 at 9:23 AM Rui Barradas wrote: Às 09:12 de 17/06/2024, Jibrin Alhassan escreveu: Hello Rui, Here is the head(df1) output Date HR IMF SWS SSN Dst f10.7 1 2012-01-01 0 4.0 379 71 -8 999.9 2 2012-01-01 1 4.4 386 71 -3 999.9 3 2012-01-01 2 4.8 380 71 -4 999.9 4 2012-01-01 3 5.4 374 71 -5 999.9 5 2012-01-01 4 4.5 369 71 -9 999.9 6 2012-01-01 5 4.2 368 71 -7 999.9 Many thanks. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Mon, Jun 17, 2024 at 8:14 AM Rui Barradas wrote: Às 07:53 de 17/06/2024, Jibrin Alhassan escreveu: Part of it is pasted below YEAR DOY HRIMF SWS SSN Dst f10.7 2012 1 0 4.0 379. 71-8 999.9 2012 1 1 4.4 386. 71-3 999.9 2012 1 2 4.8 380. 71-4 999.9 2012 1 3 5.4 374. 71-5 999.9 2012 1 4 4.5 369. 71-9 999.9 2012 1 5
Re: [R] code for year month day hr format
Às 09:12 de 17/06/2024, Jibrin Alhassan escreveu: Hello Rui, Here is the head(df1) output Date HR IMF SWS SSN Dst f10.7 1 2012-01-01 0 4.0 379 71 -8 999.9 2 2012-01-01 1 4.4 386 71 -3 999.9 3 2012-01-01 2 4.8 380 71 -4 999.9 4 2012-01-01 3 5.4 374 71 -5 999.9 5 2012-01-01 4 4.5 369 71 -9 999.9 6 2012-01-01 5 4.2 368 71 -7 999.9 Many thanks. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Mon, Jun 17, 2024 at 8:14 AM Rui Barradas wrote: Às 07:53 de 17/06/2024, Jibrin Alhassan escreveu: Part of it is pasted below YEAR DOY HRIMF SWS SSN Dst f10.7 2012 1 0 4.0 379. 71-8 999.9 2012 1 1 4.4 386. 71-3 999.9 2012 1 2 4.8 380. 71-4 999.9 2012 1 3 5.4 374. 71-5 999.9 2012 1 4 4.5 369. 71-9 999.9 2012 1 5 4.2 368. 71-7 999.9 2012 1 6 4.7 367. 71-6 999.9 2012 1 7 4.1 361. 71 -10 999.9 2012 1 8 3.2 362. 71-7 999.9 2012 1 9 4.3 367. 71-3 999.9 2012 1 10 4.5 365. 71-6 999.9 2012 1 11 5.6 369. 71-8 999.9 2012 1 12 5.2 366. 71-8 999.9 2012 1 13 4.4 370. 71-7 999.9 2012 1 14 4.8 357. 71-5 999.9 2012 1 15 4.6 354. 71-8 999.9 2012 1 16 3.7 382. 71-7 999.9 2012 1 17 3.2 376. 71-2 999.9 2012 1 18 2.8 368. 71 2 999.9 2012 1 19 3.2 361. 71 2 999.9 2012 1 20 3.2 361. 71-3 999.9 2012 1 21 3.5 365. 71-5 999.9 2012 1 22 3.6 364. 71-3 999.9 2012 1 23 3.0 362. 71-3 999.9 2012 2 0 3.2 359. 92-5 130.3 2012 2 1 3.0 361. 92-4 130.3 2012 2 2 4.5 374. 92 3 130.3 2012 2 3 4.5 364. 92 5 130.3 2012 2 4 5.1 352. 92 3 130.3 2012 2 5 4.9 358. 92 3 130.3 2012 2 6 4.4 346. 92 4 130.3 2012 2 7 4.2 349. 92 7 130.3 2012 2 8 4.5 346. 92 8 130.3 2012 2 9 5.2 345. 92 7 130.3 2012 2 10 5.0 349. 92 5 130.3 2012 2 11 4.8 345. 92 0 130.3 2012 2 12 5.3 347. 92 0 130.3 2012 2 13 5.5 342. 92 0 130.3 2012 2 14 6.1 359. 92 1 130.3 2012 2 15 6.2 393. 92 8 130.3 2012 2 16 6.7 390. 9210 130.3 2012 2 17 7.7 369. 9210 130.3 2012 2 18 9.4 380. 9214 130.3 2012 2 19 10.6 386. 9212 130.3 2012 2 20 10.2 378. 9211 130.3 2012 2 21 11.6 369. 92 7 130.3 2012 2 22 12.0 369. 92 8 130.3 2012 2 23 10.5 361. 92 1 130.3 2012 3 0 11.3 403. 120-7 130.2 2012 3 1 10.3 412. 120 -14 130.2 2012 3 2 8.8 419. 120 -18 130.2 2012 3 3 8.3 412. 120 -23 130.2 2012 3 4 8.0 408. 120 -25 130.2 2012 3 5 7.0 380. 120 -28 130.2 2012 3 6 6.9 374. 120 -29 130.2 2012 3 7 6.9 372. 120 -30 130.2 2012 3 8 7.1 365. 120 -32 130.2 2012 3 9 6.8 376. 120 -35 130.2 2012 3 10 6.7 380. 120 -35 130.2 2012 3 11 6.4 381. 120 -30 130.2 2012 3 12 5.9 401. 120 -26 130.2 2012 3 13 5.9 405. 120 -23 130.2 2012 3 14 5.9 413. 120 -20 130.2 2012 3 15 5.9 406. 120 -20 130.2 2012 3 16 6.3 427. 120 -20 130.2 2012 3 17 5.9 424. 120 -19 130.2 2012 3 18 4.8 390. 120 -16 130.2 2012 3 19 4.8 374. 120 -15 130.2 2012 3 20 4.8 374. 120 -15 130.2 2012 3 21 5.1 378. 120 -18 130.2 2012 3 22 4.9 375. 120 -19 130.2 2012 3 23 4.7 364. 120 -17 130.2 2012 4 0 4.3 359. 126 -17 131.6 2012 4 1 4.3 359. 126 -15 131.6 2012 4 2 4.2 358. 126 -13 131.6 2012 4 3 3.8 359. 126 -13 131.6 2012 4 4 3.8 358. 126 -13 131.6 2012 4 5 3.7 359. 126 -14 131.6 2012 4 6 3.9 361. 126 -13 131.6 2012 4 7 3.7 364. 126 -13 131.6 2012 4 8 3.7 366. 126 -12 131.6 2012 4 9 3.8 363. 126 -10 131.6 2012 4 10 3.5 363. 126-8 131.6 2012 4 11 3.0 352. 126 -10 131.6 2012 4 12 3.1 348. 126 -12 131.6 2012 4 13 3.3 340. 126-9 131.6 2012 4 14 4.0 343. 126-8 131.6 2012 4 15 4.2 343. 126-7 131.6 2012 4 16 3.8 336. 126-5 131.6 2012 4 17 3.9 334. 126-6 131.6 2012 4 18 3.8 329. 126-5 131.6 2012 4 19 3.8 326. 126-4 131.6 2012 4 20 4.3 337. 126-3 131.6 2012 4 21 3.9 331. 126 0 131.6 2012 4 22 3.8 322. 126-1 131.6 2012 4 23 3.5 331. 126-1 131.6 2012 5 0 3.9 312. 109-3 136.6 2012 5 1 3.6 311. 109-1 136.6 2012 5 2 3.7 312. 109 0 136.6 2012 5 3 3.8 308. 109 0 136.6 2012 5 4 4.0 305. 109 2 136.6 2012 5 5 4.5 309. 109 2 136.6 2012 5 6 3.5 314. 109 3 136.6 2012 5 7 3.6 305. 109 2 136.6 2012 5 8 4.3 307. 109 2 136.6 2012 5 9 4.6 316. 109 1 136.6 2012 5 10 5.0 321. 109-4 136.6 2012 5 11 5.1 321. 109-6 136.6 2012 5 12 4.6 326. 109-4 136.6
Re: [R] code for year month day hr format
135.1 2012 16 12 13.2 424. 154 -10 135.1 2012 16 13 12.9 433. 154-8 135.1 2012 16 14 9.3 461. 154-7 135.1 2012 16 15 6.6 466. 154 -14 135.1 2012 16 16 6.6 493. 154 -11 135.1 2012 16 17 7.4 496. 154-7 135.1 2012 16 18 6.2 493. 154-7 135.1 2012 16 19 6.9 492. 154 -13 135.1 2012 16 20 6.8 486. 154 -19 135.1 2012 16 21 5.6 488. 154 -14 135.1 2012 16 22 6.4 464. 154 -11 135.1 2012 16 23 6.0 459. 154 -10 135.1 2012 17 0 4.9 476. 141 -14 134.5 2012 17 1 4.6 460. 141 -20 134.5 2012 17 2 4.1 467. 141 -17 134.5 2012 17 3 3.7 469. 141 -13 134.5 2012 17 4 3.3 472. 141 -12 134.5 2012 17 5 2.7 472. 141-8 134.5 2012 17 6 3.5 459. 141-6 134.5 2012 17 7 3.9 459. 141-6 134.5 2012 17 8 4.1 463. 141-7 134.5 2012 17 9 4.1 443. 141 -10 134.5 2012 17 10 4.1 446. 141 -14 134.5 2012 17 11 4.1 442. 141 -13 134.5 2012 17 12 3.6 436. 141 -10 134.5 2012 17 13 3.6 433. 141-6 134.5 2012 17 14 4.2 421. 141-1 134.5 2012 17 15 3.7 416. 141-2 134.5 2012 17 16 4.2 410. 141-1 134.5 2012 17 17 4.6 396. 141-1 134.5 2012 17 18 4.5 398. 141-2 134.5 2012 17 19 4.4 397. 141-6 134.5 2012 17 20 4.5 396. 141-8 134.5 2012 17 21 3.5 411. 141-5 134.5 2012 17 22 3.9 425. 141-5 134.5 2012 17 23 4.7 418. 141-6 134.5 2012 18 0 4.6 400. 126-7 143.4 2012 18 1 4.5 413. 126-3 143.4 2012 18 2 4.4 418. 126 2 143.4 2012 18 3 4.2 420. 126 2 143.4 2012 18 4 4.0 401. 126-2 143.4 2012 18 5 3.8 399. 126-1 143.4 2012 18 6 3.5 388. 126-1 143.4 2012 18 7 4.4 393. 126-2 143.4 2012 18 8 4.7 405. 126-3 143.4 2012 18 9 4.8 409. 126-4 143.4 2012 18 10 4.9 409. 126-3 143.4 2012 18 11 5.0 411. 126-5 143.4 2012 18 12 5.1 405. 126-5 143.4 2012 18 13 5.2 403. 126-6 143.4 2012 18 14 5.1 394. 126-4 143.4 2012 18 15 5.0 391. 126-5 143.4 2012 18 16 4.6 387. 126-4 143.4 2012 18 17 4.7 376. 126-2 143.4 2012 18 18 4.7 381. 126-1 143.4 2012 18 19 4.5 382. 126-2 143.4 2012 18 20 4.9 386. 126-5 143.4 2012 18 21 4.8 375. 126-5 143.4 2012 18 22 4.7 385. 126-6 143.4 2012 18 23 4.7 381. 126-5 143.4 2012 19 0 4.3 372. 105-3 152.0 2012 19 1 4.2 361. 105-4 152.0 2012 19 2 4.0 360. 105-5 152.0 2012 19 3 3.9 362. 105-4 152.0 *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Mon, Jun 17, 2024 at 7:50 AM Jibrin Alhassan wrote: Hello Rui, Your patience is indeed amazing. Your script tested as shown below worked perfectly well. df1 <- read.table(text = "YEAR DOY HR IMF SW SSNDst f10.7 2012 215 4 5.1 371. 143-4 138.6 ", header = TRUE) with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j") df1$Date <- with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j") df1 <- df1[-(1:2)] df1 <- df1[c(ncol(df1), 1:(ncol(df1) - 1L))] head(df1). But I have 43,849 data points. Your script only generated one. Help me with a script that can handle the whole data points. I have tried following your tested solution but was unsuccessful. My regards. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Sun, Jun 16, 2024 at 8:33 AM Rui Barradas wrote: Às 21:42 de 15/06/2024, Jibrin Alhassan escreveu: Thank you Rui. I ran the following script df1 <- read.table("solar_hour", header = TRUE) df1$date <- as.Date(paste(df1$year, df1$hour), format = "%Y %j", origin = "2012-08-01-0") df2 <- df1[c("date", "IMF", "SWS", "SSN", "Dst", "f10")] head(df1) #To display all the rows print(df2). It gave me this error message source ("script.R") Error in `$<-.data.frame`(`*tmp*`, date, value = numeric(0)) : replacement has 0 rows, data has 38735 print(df2) Error: object 'df2' not found My data is an hourly data but desire to have the date as yearmonthday hour 2012 08 01 01 2012 08 01 02 2012 08 01 03 etc Thanks. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Sat, Jun 15, 2024 at 8:34 PM Rui Barradas wrote: Às 20:00 de 15/06/2024, Jibrin Alhassan escreveu: I have solar-geophysical data e.g as blow: YEAR DOY HR IMF SW SSNDst f10.7 2012 214 0 3.4 403. 132-9 154.6 2012 214 1 3.7 388. 132 -10 154.6 2012 214 2 3.7 383. 132 -10 154.6 2012 214 3 3.7 391. 132-9 154.6 2012 214 4 4.2 399. 132-7 154.6 2012 214 5 4.1 411. 132-6 154.6 2012 214 6 4.0 407. 132-6 154.6 2012 214 7 4.2 404.
Re: [R] code for year month day hr format
Às 21:42 de 15/06/2024, Jibrin Alhassan escreveu: Thank you Rui. I ran the following script df1 <- read.table("solar_hour", header = TRUE) df1$date <- as.Date(paste(df1$year, df1$hour), format = "%Y %j", origin = "2012-08-01-0") df2 <- df1[c("date", "IMF", "SWS", "SSN", "Dst", "f10")] head(df1) #To display all the rows print(df2). It gave me this error message source ("script.R") Error in `$<-.data.frame`(`*tmp*`, date, value = numeric(0)) : replacement has 0 rows, data has 38735 print(df2) Error: object 'df2' not found My data is an hourly data but desire to have the date as yearmonthday hour 2012 08 01 01 2012 08 01 02 2012 0801 03 etc Thanks. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka On Sat, Jun 15, 2024 at 8:34 PM Rui Barradas wrote: Às 20:00 de 15/06/2024, Jibrin Alhassan escreveu: I have solar-geophysical data e.g as blow: YEAR DOY HR IMF SW SSNDst f10.7 2012 214 0 3.4 403. 132-9 154.6 2012 214 1 3.7 388. 132 -10 154.6 2012 214 2 3.7 383. 132 -10 154.6 2012 214 3 3.7 391. 132-9 154.6 2012 214 4 4.2 399. 132-7 154.6 2012 214 5 4.1 411. 132-6 154.6 2012 214 6 4.0 407. 132-6 154.6 2012 214 7 4.2 404. 132-4 154.6 2012 214 8 4.3 405. 132-6 154.6 2012 214 9 4.4 409. 132-6 154.6 2012 214 10 4.4 401. 132-6 154.6 2012 214 11 4.5 385. 132-7 154.6 2012 214 12 4.7 377. 132-8 154.6 2012 214 13 4.7 382. 132-6 154.6 2012 214 14 4.3 396. 132-4 154.6 2012 214 15 4.1 384. 132-2 154.6 2012 214 16 4.0 382. 132-1 154.6 2012 214 17 3.9 397. 132 0 154.6 2012 214 18 3.8 390. 132 1 154.6 2012 214 19 4.2 400. 132 2 154.6 2012 214 20 4.6 408. 132 1 154.6 2012 214 21 4.8 401. 132-3 154.6 2012 214 22 4.9 395. 132-5 154.6 2012 214 23 5.0 386. 132-1 154.6 2012 215 0 5.0 377. 143-1 138.6 2012 215 1 4.9 384. 143-2 138.6 2012 215 2 4.9 390. 143-4 138.6 2012 215 3 4.9 372. 143-6 138.6 2012 215 4 5.1 371. 143-4 138.6 I want to process it to be of the format as shown below y m d hr imf sws ssnDst f10.7 2012-08-01 10 3.4 403. 132-9 154.6 2012-08-01 12 3.7 388. 132 -10 154.6 2012-08-01 15 3.7 383. 132 -10 154.6 2012-08-01 17 3.7 391. 132-9 154.6 I want to request an R code to accomplish this task. Thanks for your time. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To create a date column, paste the first two columns and coerce to class "Date" with conversion specifications %Y for the 4 digit year and %j for the day of year. See help("strptime") df1 <- read.table(text = "YEAR DOY HR IMF SW SSNDst f10.7 2012 214 0 3.4 403. 132-9 154.6 2012 214 1 3.7 388. 132 -10 154.6 2012 214 2 3.7 383. 132 -10 154.6 2012 214 3 3.7 391. 132-9 154.6 2012 214 4 4.2 399. 132-7 154.6 2012 214 5 4.1 411. 132-6 154.6 2012 214 6 4.0 407. 132-6 154.6 2012 214 7 4.2 404. 132-4 154.6 2012 214 8 4.3 405. 132-6 154.6 2012 214 9 4.4 409. 132-6 154.6 2012 214 10 4.4 401. 132-6 154.6 2012 214 11 4.5 385. 132-7 154.6 2012 214 12 4.7 377. 132-8 154.6 2012 214 13 4.7 382. 132-6 154.6 2012 214 14 4.3 396. 132-4 154.6 2012 214 15 4.1 384. 132-2 154.6 2012 214 16 4.0 382. 132-1 154.6 2012 214 17 3.9 397. 132 0 154.6 2012 214 18 3.8 390. 132 1 154.6 2012 214 19 4.2 400. 132 2 154.6 2012 214 20 4.6 408. 132 1 154.6 2012 214 21 4.8 401. 132-3 154.6 2012 214 22 4.9 395. 132-5 154.6 2012 214 23 5.0 386. 132-1 154.6 2012 215 0 5.0 377. 143-1 138.6 2012 215 1 4.9 384. 143-2 138.6 2012 215 2 4.9 390. 143-4 138.6 2012 215 3 4.9 372. 143-6 138.6 2012 215 4 5.1 371. 143-4 138.6", header = TRUE) with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j") #> [1] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" #> [6] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" #> [11] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01
Re: [R] code for year month day hr format
Às 20:00 de 15/06/2024, Jibrin Alhassan escreveu: I have solar-geophysical data e.g as blow: YEAR DOY HR IMF SW SSNDst f10.7 2012 214 0 3.4 403. 132-9 154.6 2012 214 1 3.7 388. 132 -10 154.6 2012 214 2 3.7 383. 132 -10 154.6 2012 214 3 3.7 391. 132-9 154.6 2012 214 4 4.2 399. 132-7 154.6 2012 214 5 4.1 411. 132-6 154.6 2012 214 6 4.0 407. 132-6 154.6 2012 214 7 4.2 404. 132-4 154.6 2012 214 8 4.3 405. 132-6 154.6 2012 214 9 4.4 409. 132-6 154.6 2012 214 10 4.4 401. 132-6 154.6 2012 214 11 4.5 385. 132-7 154.6 2012 214 12 4.7 377. 132-8 154.6 2012 214 13 4.7 382. 132-6 154.6 2012 214 14 4.3 396. 132-4 154.6 2012 214 15 4.1 384. 132-2 154.6 2012 214 16 4.0 382. 132-1 154.6 2012 214 17 3.9 397. 132 0 154.6 2012 214 18 3.8 390. 132 1 154.6 2012 214 19 4.2 400. 132 2 154.6 2012 214 20 4.6 408. 132 1 154.6 2012 214 21 4.8 401. 132-3 154.6 2012 214 22 4.9 395. 132-5 154.6 2012 214 23 5.0 386. 132-1 154.6 2012 215 0 5.0 377. 143-1 138.6 2012 215 1 4.9 384. 143-2 138.6 2012 215 2 4.9 390. 143-4 138.6 2012 215 3 4.9 372. 143-6 138.6 2012 215 4 5.1 371. 143-4 138.6 I want to process it to be of the format as shown below y m d hr imf sws ssnDst f10.7 2012-08-01 10 3.4 403. 132-9 154.6 2012-08-01 12 3.7 388. 132 -10 154.6 2012-08-01 15 3.7 383. 132 -10 154.6 2012-08-01 17 3.7 391. 132-9 154.6 I want to request an R code to accomplish this task. Thanks for your time. *Jibrin Adejoh Alhassan (Ph.D)* Department of Physics and Astronomy, University of Nigeria, Nsukka [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To create a date column, paste the first two columns and coerce to class "Date" with conversion specifications %Y for the 4 digit year and %j for the day of year. See help("strptime") df1 <- read.table(text = "YEAR DOY HR IMF SW SSNDst f10.7 2012 214 0 3.4 403. 132-9 154.6 2012 214 1 3.7 388. 132 -10 154.6 2012 214 2 3.7 383. 132 -10 154.6 2012 214 3 3.7 391. 132-9 154.6 2012 214 4 4.2 399. 132-7 154.6 2012 214 5 4.1 411. 132-6 154.6 2012 214 6 4.0 407. 132-6 154.6 2012 214 7 4.2 404. 132-4 154.6 2012 214 8 4.3 405. 132-6 154.6 2012 214 9 4.4 409. 132-6 154.6 2012 214 10 4.4 401. 132-6 154.6 2012 214 11 4.5 385. 132-7 154.6 2012 214 12 4.7 377. 132-8 154.6 2012 214 13 4.7 382. 132-6 154.6 2012 214 14 4.3 396. 132-4 154.6 2012 214 15 4.1 384. 132-2 154.6 2012 214 16 4.0 382. 132-1 154.6 2012 214 17 3.9 397. 132 0 154.6 2012 214 18 3.8 390. 132 1 154.6 2012 214 19 4.2 400. 132 2 154.6 2012 214 20 4.6 408. 132 1 154.6 2012 214 21 4.8 401. 132-3 154.6 2012 214 22 4.9 395. 132-5 154.6 2012 214 23 5.0 386. 132-1 154.6 2012 215 0 5.0 377. 143-1 138.6 2012 215 1 4.9 384. 143-2 138.6 2012 215 2 4.9 390. 143-4 138.6 2012 215 3 4.9 372. 143-6 138.6 2012 215 4 5.1 371. 143-4 138.6", header = TRUE) with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j") #> [1] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" #> [6] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" #> [11] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" #> [16] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" #> [21] "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-01" "2012-08-02" #> [26] "2012-08-02" "2012-08-02" "2012-08-02" "2012-08-02" # now create the column df1$Date <- with(df1, paste(YEAR, DOY)) |> as.Date(format = "%Y %j") # remove the columns no longer needed df1 <- df1[-(1:2)] # relocate the new date column df1 <- df1[c(ncol(df1), 1:(ncol(df1) - 1L))] head(df1) #> Date HR IMF SW SSN Dst f10.7 #> 1 2012-08-01 0 3.4 403 132 -9 154.6 #> 2 2012-08-01 1 3.7 388 132 -10 154.6 #> 3 2012-08-01 2 3.7 383 132 -10 154.6 #> 4 2012-08-01 3 3.7 391 132 -9 154.6 #> 5 2012-08-01 4 4.2 399 132 -7 154.6 #> 6 2012-08-01 5 4.1 411 132 -6 154.6 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] my R code worked well when running the first 1000 lines of R code
Às 20:44 de 12/06/2024, Yuan Chun Ding escreveu: Hi Rui, Thank you very much! Yes, I verified using real data, it worked correctly as expected after adding tidyr:: to the pivot_longer function and dplyr:: to the group_by and summarize Function. I did not know how to assign the tidyr and dplyr to the three functions because I do not really understand well the three functions and just got the code from a google search. I also tried your simplified code, but got the following error Error in `dplyr::summarize()`: ! Can't supply both `.by` and `.groups`. Run `rlang::last_trace()` to see where the error occurred. Ding From: Rui Barradas Sent: Wednesday, June 12, 2024 11:29 AM To: Yuan Chun Ding ; CALUM POLWART Cc: r-help@r-project.org Subject: Re: [R] my R code worked well when running the first 1000 lines of R code Hello, Inline. Às 19: 03 de 12/06/2024, Yuan Chun Ding via R-help escreveu: > I am sorry that I know I should provide a dataset that allows to replicate my problem. > > It is a research dataset and quite large, so I can not share. > Hello, Inline. Às 19:03 de 12/06/2024, Yuan Chun Ding via R-help escreveu: I am sorry that I know I should provide a dataset that allows to replicate my problem. It is a research dataset and quite large, so I can not share. Both Bert and Tim guessed my problem correctly. I also thought about the conflicting issue between different packages and function masking. I just hope to that someone has similar experience, so providing me suggestion. For conflicting issue, What I tried was to add dplyr::pivot_longer or tidyr:: pivot_longer, Do that to all functions comming from contributed packages. At least to those. summary_anno1148ft <- anno1148ft %>% tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = "measure") %>% dplyr::group_by(dat, measure) %>% dplyr::summarize(minimum = min(value,na.rm=T), q25 = quantile(value, probs = 0.25,na.rm=T), med = median(value,na.rm=T), q75 = quantile(value, probs = 0.75,na.rm=T), maximum = max(value,na.rm=T), average = mean(value,na.rm=T), #standard_deviation = sd(value), .groups = "drop" ) Or, simpler, no need to group_by anymore. It can be done in summarise. summary_anno1148ft <- anno1148ft %>% tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = "measure") %>% dplyr::summarize(minimum = min(value,na.rm=T), q25 = quantile(value, probs = 0.25,na.rm=T), med = median(value,na.rm=T), q75 = quantile(value, probs = 0.75,na.rm=T), maximum = max(value,na.rm=T), average = mean(value,na.rm=T), #standard_deviation = sd(value), .by = c(dat, measure), .groups = "drop" ) This is only a guess, the question cannot really be answered. Hope this helps, Rui Barradas but still not resolved the problem. I will restart from the first line my code, it will work again and then I will track down. Thank you, Ding From: CALUM POLWART mailto:polc1...@gmail.com>> Sent: Wednesday, June 12, 2024 10:52 AM To: Yuan Chun Ding mailto:ycd...@coh.org>> Cc: r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] my R code worked well when running the first 1000 lines of R code I sometimes think people on this list are quite rude to posters. I'm afraid I'm likely to join in with some rudeness? 1. "Here is some code that works but also doesn't" is probably not going to get you an answer 2. I provide I sometimes think people on this list are quite rude to posters. I'm afraid I'm likely to join in with some rudeness? 1. "Here is some code that works but also doesn't" is probably not going to get you an answer 2. I provide no information about the data it works on or doesn't 3. I tell you I'm using a load of dependencies, but don't tell you what 4. I refer to 2000 lines of code but probably means 2000 lines of data? So. Please post a question someone can actually answer. If the question is "why might code fail on a 2000 line dataset when it works on 1000 line dataset" then here are some thoughts: * Is the 1000 lines being run as dataset[1:1000,] or is it dataset1 and dataset2 ? * Is there a structural difference in the datasets - i.e. numbers, characters or factors as columns. Often import functions guess a column type by reading the first 500/1000 lines. If the data has numbers
Re: [R] my R code worked well when running the first 1000 lines of R code
Hello, Inline. Às 19:03 de 12/06/2024, Yuan Chun Ding via R-help escreveu: I am sorry that I know I should provide a dataset that allows to replicate my problem. It is a research dataset and quite large, so I can not share. Both Bert and Tim guessed my problem correctly. I also thought about the conflicting issue between different packages and function masking. I just hope to that someone has similar experience, so providing me suggestion. For conflicting issue, What I tried was to add dplyr::pivot_longer or tidyr:: pivot_longer, Do that to all functions comming from contributed packages. At least to those. summary_anno1148ft <- anno1148ft %>% tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = "measure") %>% dplyr::group_by(dat, measure) %>% dplyr::summarize(minimum = min(value,na.rm=T), q25 = quantile(value, probs = 0.25,na.rm=T), med = median(value,na.rm=T), q75 = quantile(value, probs = 0.75,na.rm=T), maximum = max(value,na.rm=T), average = mean(value,na.rm=T), #standard_deviation = sd(value), .groups = "drop" ) Or, simpler, no need to group_by anymore. It can be done in summarise. summary_anno1148ft <- anno1148ft %>% tidyr::pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = "measure") %>% dplyr::summarize(minimum = min(value,na.rm=T), q25 = quantile(value, probs = 0.25,na.rm=T), med = median(value,na.rm=T), q75 = quantile(value, probs = 0.75,na.rm=T), maximum = max(value,na.rm=T), average = mean(value,na.rm=T), #standard_deviation = sd(value), .by = c(dat, measure), .groups = "drop" ) This is only a guess, the question cannot really be answered. Hope this helps, Rui Barradas but still not resolved the problem. I will restart from the first line my code, it will work again and then I will track down. Thank you, Ding From: CALUM POLWART Sent: Wednesday, June 12, 2024 10:52 AM To: Yuan Chun Ding Cc: r-help@r-project.org Subject: Re: [R] my R code worked well when running the first 1000 lines of R code I sometimes think people on this list are quite rude to posters. I'm afraid I'm likely to join in with some rudeness? 1. "Here is some code that works but also doesn't" is probably not going to get you an answer 2. I provide I sometimes think people on this list are quite rude to posters. I'm afraid I'm likely to join in with some rudeness? 1. "Here is some code that works but also doesn't" is probably not going to get you an answer 2. I provide no information about the data it works on or doesn't 3. I tell you I'm using a load of dependencies, but don't tell you what 4. I refer to 2000 lines of code but probably means 2000 lines of data? So. Please post a question someone can actually answer. If the question is "why might code fail on a 2000 line dataset when it works on 1000 line dataset" then here are some thoughts: * Is the 1000 lines being run as dataset[1:1000,] or is it dataset1 and dataset2 ? * Is there a structural difference in the datasets - i.e. numbers, characters or factors as columns. Often import functions guess a column type by reading the first 500/1000 lines. If the data has numbers in column 1 for 1-1000 but on line 1999 has a letter... The data type may vary. On Wed, 12 Jun 2024, 17:28 Yuan Chun Ding via R-help, mailto:r-help@r-project.org>> wrote: Hi R users, The following code worked well to summarize four data groups in a dataframe for three variables (t_depth, t_alt_count, t_alt_ratio), 12 columns of summary, see attached. However, after running another 2000 lines of R codes using functions from more than 10 other R libraries, then it only generated one column of summary. Do you know why? Thank you, Yuan Chun Ding summary_anno1148ft <- anno1148ft %>% pivot_longer(c(t_depth, t_alt_count, t_alt_ratio), names_to = "measure") %>% group_by(dat, measure) %>% summarize(minimum = min(value,na.rm=T), q25 = quantile(value, probs = 0.25,na.rm=T), med = median(value,na.rm=T), q75 = quantile(value, probs = 0.75,na.rm=T), maximum = max(value,na.rm=T), average = mean(value,na.rm=T), #standard_deviation = sd(value), .groups = "drop" ) summary_anno1148ft <-t(summary_anno1148ft) -- -SECURITY/CONFIDENTIALITY WARNING- This message and any attachments are intended solely for the individual or entity
Re: [R] Format
Às 21:39 de 09/06/2024, Val escreveu: HI all, My I am trying to convert character date (mm/dd/yy) to -mm-dd date format in one of the columns of my data file. The first few lines of the data file looks like as follow head(Atest,10);dim(Atest) ddate 1 19/08/21 2 30/04/18 3 28/08/21 4 11/10/21 5 07/09/21 6 15/08/21 7 03/09/21 8 23/07/18 9 17/08/20 10 23/09/20 [1] 1270076 1 I am using the following different scenarios but none of them resulted the desired result. library(data.table) library(stringr) library(lubridate) Atest$ddate1 <- as.Date((Atest$ddate), format = "%m/%d/%y") Atest$ddate2 <- mdy((Atest$ddate)) Atest$ddate3 <= as.Date(as.character(Atest$ddate),format="%m/%d/%y") Atest$ddate4 <- as.Date(as.character(Atest$ddate),"%m/%d/%y") Atest$ddate5 <- lubridate::mdy(Atest$ddate) head(Atest,3) ddate ddate1 ddate2 ddate4 ddate5 1 19/08/21 2 30/04/18 3 28/08/21 Any help why I am not getting the desired result. Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Day is clearly first, format "%m/%d/%y" assumes a month 19 in 19/08/21. Try as.Date(Atest$ddate, format = "%d/%m/%y") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code for overlapping variables -- count
Às 18:40 de 02/06/2024, Rui Barradas escreveu: Às 18:34 de 02/06/2024, Leo Mada via R-help escreveu: Dear Shadee, If you have a data.frame with the following columns: n = 100; # population size x = data.frame( Sex = sample(c("M","F"), n, T), Country = sample(c("AA", "BB", "US"), n, T), Income = as.factor(sample(1:3, n, T)) ) # Dummy variable ONE = rep(1, nrow(x)) r = aggregate(ONE ~ Sex + Income + Country, length, data = x) r = r[, c("Country", "Income", "Sex")] print(r) It is possible to write more simple code, if you need only the particular combination of variables (which you specified in your mail). But this is the more general approach. Note: you may want to use "sum" instead of "length", e.g. if you have a column specifying the number of individuals in that category. Hope this helps, Leonard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The following is simpler. r2 <- xtabs(~ ., x) |> as.data.frame() r2[-4L] # or r2[names(r2) != "Freq"] Hope this helps, Rui Barradas Hello, This is the same solution but the code to keep only the columns in the original data set is better. And it's a MRE. n <- 100; # population size x <- data.frame( Sex = sample(c("M","F"), n, T), Country = sample(c("AA", "BB", "US"), n, T), Income = as.factor(sample(1:3, n, T)) ) r2 <- xtabs(~ ., x) |> as.data.frame() # no need for constants, find the columns # to keep from the data r2[names(r2) %in% names(x)] Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code for overlapping variables -- count
Às 18:34 de 02/06/2024, Leo Mada via R-help escreveu: Dear Shadee, If you have a data.frame with the following columns: n = 100; # population size x = data.frame( Sex = sample(c("M","F"), n, T), Country = sample(c("AA", "BB", "US"), n, T), Income = as.factor(sample(1:3, n, T)) ) # Dummy variable ONE = rep(1, nrow(x)) r = aggregate(ONE ~ Sex + Income + Country, length, data = x) r = r[, c("Country", "Income", "Sex")] print(r) It is possible to write more simple code, if you need only the particular combination of variables (which you specified in your mail). But this is the more general approach. Note: you may want to use "sum" instead of "length", e.g. if you have a column specifying the number of individuals in that category. Hope this helps, Leonard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The following is simpler. r2 <- xtabs(~ ., x) |> as.data.frame() r2[-4L] # or r2[names(r2) != "Freq"] Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add only the 1st of May with POSIXct
Às 07:01 de 29/05/2024, Stefano Sofia escreveu: Thank you Rui for your code. I basically understood all your suggestions. I am using an old version of R (version 3.6.3, installed in a server I am not allowed to control), and the new pipe operator does not work. I tried to run your code without the "|>" operator, but I get an error when I use apply. Could you please expand your code without the pipe operator? Thank you again for your help Stefano (oo) --oOO--( )--OOo-- Stefano Sofia PhD Civil Protection - Marche Region - Italy Meteo Section Snow Section Via del Colle Ameno 5 60126 Torrette di Ancona, Ancona (AN) Uff: +39 071 806 7743 E-mail: stefano.so...@regione.marche.it ---Oo-oO ____ Da: Rui Barradas Inviato: martedì 28 maggio 2024 18:19 A: Stefano Sofia; r-help@R-project.org Oggetto: Re: [R] add only the 1st of May with POSIXct [Non ricevi spesso messaggi di posta elettronica da ruipbarra...@sapo.pt. Per informazioni sull'importanza di questo fatto, visita https://aka.ms/LearnAboutSenderIdentification.] Às 16:23 de 28/05/2024, Stefano Sofia escreveu: Dear R-list users, From an initial and a final date I create a sequence of days using POSIXct. If this interval covers all or only in part the months from May to October, I need to get rid of the days from the 2nd of May to the 31st of October: a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1") b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1") mydf <- data.frame(data_POSIX=seq(as.POSIXct(paste(format(a, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), as.POSIXct(paste(format(b, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="1 day")) If I execute as.data.frame(mydf[format(mydf$data_POSIX,"%m") %in% c("11", "12", "01", "02", "03", "04"), ]) the interval will be from 2002-11-01 09:00:00 to 2003-04-30 09:00:00 and from 2003-11-01 09:00:00 to 2004-04-30 09:00:00 but I need also 2003-05-01 09:00:00 and 2004-05-01 09:00:00 How can I solve this problem? Thank you for your attention and your help Stefano (oo) --oOO--( )--OOo-- Stefano Sofia PhD Civil Protection - Marche Region - Italy Meteo Section Snow Section Via del Colle Ameno 5 60126 Torrette di Ancona, Ancona (AN) Uff: +39 071 806 7743 E-mail: stefano.so...@regione.marche.it ---Oo-oO AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere informazioni confidenziali, pertanto � destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si � il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed urgenza, la risposta al presente messaggio di posta elettronica pu� essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C02%7Cstefano.sofia%40regione.marche.it%7C0d812d3223344a1508d408dc7f31f657%7C295eaa1431a14b09bfe65a338b679f60%7C0%7C0%7C638525100275684754%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C6%7C%7C%7C&sdata=ac0Hx9auMSeXgsllDaaimZDFBpSLZ%2B3OeOGQoVvcjxQ%3D&reserved=0 PLEASE do read the posting guide https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C02%7Cstefano.sofia%40regione.marche.it%7C0d812d3223344a1508d408dc7f31f657%7C295eaa1431a14b09bfe65a338b679f60%7
Re: [R] add only the 1st of May with POSIXct
Às 16:23 de 28/05/2024, Stefano Sofia escreveu: Dear R-list users, From an initial and a final date I create a sequence of days using POSIXct. If this interval covers all or only in part the months from May to October, I need to get rid of the days from the 2nd of May to the 31st of October: a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1") b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1") mydf <- data.frame(data_POSIX=seq(as.POSIXct(paste(format(a, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), as.POSIXct(paste(format(b, "%Y-%m-%d"), "09:00:00", sep=""), format="%Y-%m-%d %H:%M:%S", tz="Etc/GMT-1"), by="1 day")) If I execute as.data.frame(mydf[format(mydf$data_POSIX,"%m") %in% c("11", "12", "01", "02", "03", "04"), ]) the interval will be from 2002-11-01 09:00:00 to 2003-04-30 09:00:00 and from 2003-11-01 09:00:00 to 2004-04-30 09:00:00 but I need also 2003-05-01 09:00:00 and 2004-05-01 09:00:00 How can I solve this problem? Thank you for your attention and your help Stefano (oo) --oOO--( )--OOo-- Stefano Sofia PhD Civil Protection - Marche Region - Italy Meteo Section Snow Section Via del Colle Ameno 5 60126 Torrette di Ancona, Ancona (AN) Uff: +39 071 806 7743 E-mail: stefano.so...@regione.marche.it ---Oo-oO AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu� contenere informazioni confidenziali, pertanto � destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si � il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si � ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit� ed urgenza, la risposta al presente messaggio di posta elettronica pu� essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, First of all, 'a' and 'b' are already objects of class "POSIXct", you don't need to repeat the code creating them when creating mydf. As for the question, see the code below. a <- as.POSIXct("2002-11-01", format = "%Y-%m-%d", tz="Etc/GMT-1") b <- as.POSIXct("2004-06-01", format = "%Y-%m-%d", tz="Etc/GMT-1") mydf <- data.frame(data_POSIX = seq(a, b, by = "1 day")) # get the years from the data years <- format(c(a, b), "%Y") |> as.integer() # this creates a sequence with all the years years <- Reduce(`:`, years) # coerce to "Date" from <- ISOdate(years, 5L, 2L, tz = "Etc/GMT-1") to <- ISOdate(years, 10L, 30L, tz = "Etc/GMT-1") # this logical index keeps only the dates between May, 2nd and Nov 1st. keep <- data.frame(from, to) |> apply(1L, \(x) x[1L] <= mydf$data_POSIX & mydf$data_POSIX <= x[2L]) |> rowSums() > 0L mydf[keep, , drop = FALSE] Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print date on y axis with month, day, and year
Às 00:58 de 10/05/2024, Sorkin, John escreveu: I am trying to use ggplot to plot the data, and R code, below. The dates (jdate) are printing as Mar 01, Mar 15, etc. I want to have the date printed as MMM DD (or any other way that will show month, date, and year, e.g. mm/dd/yy). How can I accomplish this? yyy <- structure(list( jdate = structure(c(19052, 19053, 19054, 19055, 19058, 19059, 19060, 19061, 19062, 19063, 19065, 19066, 19067, 19068, 19069, 19072, 19073, 19074, 19075, 19076, 19077, 19083, 19086, 19087, 19088, 19089, 19090, 19093, 19094, 19095), class = "Date"), Sum = c ( 1, 3, 9, 11, 13, 16, 18, 22, 26, 27, 30, 32, 35, 39, 41, 43, 48, 51, 56, 58, 59, 63, 73, 79, 81, 88, 91, 93, 96, 103)), row.names = c(NA, 30L), class = "data.frame") yyy class(yyy$jdate) ggplot(data=yyy[1:30,],aes(as.Date(jdate,format="%m-%d-%Y"),Sum)) +geom_point() Thank you John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Since class(yyy$jdate) returns "Date", you have a real date and scale_x_date can handle the printed formats, there is no need for an extra as.Date in aes(). And get rid of the format = "%m-%d-%Y" argument. Let scale_x_date take care of formating the date as you want it displayed. Any of the two below is a valid date format. ggplot(data = yyy[1:30,], aes(jdate, Sum)) + geom_point() + # scale_x_date(date_labels = "%b %d, %Y") scale_x_date(date_labels = "%m/%d/%Y") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x[0]: Can '0' be made an allowed index in R?
Às 09:08 de 21/04/2024, Rui Barradas escreveu: Às 08:55 de 21/04/2024, Hans W escreveu: As we all know, in R indices for vectors start with 1, i.e, x[0] is not a correct expression. Some algorithms, e.g. in graph theory or combinatorics, are much easier to formulate and code if 0 is an allowed index pointing to the first element of the vector. Some programming languages, for instance Julia (where the index for normal vectors also starts with 1), provide libraries/packages that allow the user to define an index range for its vectors, say 0:9 or 10:20 or even negative indices. Of course, this notation would only be feasible for certain specially defined vectors. Is there a library that provides this functionality? Or is there a simple trick to do this in R? The expression 'x[0]' must be possible, does this mean the syntax of R has to be twisted somehow? Thanks, Hans W. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I find what you are asking awkward but it can be done with S3 classes. Write an extraction method for the new class and in the use case below it works. The method increments the ndex before calling NextMethod, the usual extraction function. `[.zerobased` <- function(x, i, ...) { i <- i + 1L NextMethod() } as_zerobased <- function(x) { class(x) <- c("zerobased", class(x)) x } x <- 1:10 y <- as_zerobased(x) y[0] #> [1] 1 y[1] #> [1] 2 y[9] #> [1] 10 y[10] #> [1] NA Hope this helps, Rui Barradas Sorry, forgot to also define a `[[zerobased` method. It's probably safer. `[[.zerobased` <- function(x, i, ...) { i <- i + 1L NextMethod() } Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] x[0]: Can '0' be made an allowed index in R?
Às 08:55 de 21/04/2024, Hans W escreveu: As we all know, in R indices for vectors start with 1, i.e, x[0] is not a correct expression. Some algorithms, e.g. in graph theory or combinatorics, are much easier to formulate and code if 0 is an allowed index pointing to the first element of the vector. Some programming languages, for instance Julia (where the index for normal vectors also starts with 1), provide libraries/packages that allow the user to define an index range for its vectors, say 0:9 or 10:20 or even negative indices. Of course, this notation would only be feasible for certain specially defined vectors. Is there a library that provides this functionality? Or is there a simple trick to do this in R? The expression 'x[0]' must be possible, does this mean the syntax of R has to be twisted somehow? Thanks, Hans W. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I find what you are asking awkward but it can be done with S3 classes. Write an extraction method for the new class and in the use case below it works. The method increments the ndex before calling NextMethod, the usual extraction function. `[.zerobased` <- function(x, i, ...) { i <- i + 1L NextMethod() } as_zerobased <- function(x) { class(x) <- c("zerobased", class(x)) x } x <- 1:10 y <- as_zerobased(x) y[0] #> [1] 1 y[1] #> [1] 2 y[9] #> [1] 10 y[10] #> [1] NA Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exceptional slowness with read.csv
Às 06:47 de 08/04/2024, Dave Dixon escreveu: Greetings, I have a csv file of 76 fields and about 4 million records. I know that some of the records have errors - unmatched quotes, specifically. Reading the file with readLines and parsing the lines with read.csv(text = ...) is really slow. I know that the first 2459465 records are good. So I try this: > startTime <- Sys.time() > first_records <- read.csv(file_name, nrows = 2459465) > endTime <- Sys.time() > cat("elapsed time = ", endTime - startTime, "\n") elapsed time = 24.12598 > startTime <- Sys.time() > second_records <- read.csv(file_name, skip = 2459465, nrows = 5) > endTime <- Sys.time() > cat("elapsed time = ", endTime - startTime, "\n") This appears to never finish. I have been waiting over 20 minutes. So why would (skip = 2459465, nrows = 5) take orders of magnitude longer than (nrows = 2459465) ? Thanks! -dave PS: readLines(n=2459470) takes 10.42731 seconds. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Can the following function be of help? After reading the data setting argument quote=FALSE, call a function applying gregexpr to its character columns, then transforming the output in a two column data.frame with columns Col - the column processed; Unbalanced - the rows with unbalanced double quotes. I am assuming the quotes are double quotes. It shouldn't be difficult to adapt it to other cas, single quotes, both cases. unbalanced_dquotes <- function(x) { char_cols <- sapply(x, is.character) |> which() lapply(char_cols, \(i) { y <- x[[i]] Unbalanced <- gregexpr('"', y) |> sapply(\(x) attr(x, "match.length") |> length()) |> {\(x) (x %% 2L) == 1L}() |> which() data.frame(Col = i, Unbalanced = Unbalanced) }) |> do.call(rbind, args = _) } # read the data disregardin g quoted strings df1 <- read.csv(fl, quote = "") # determine which strings have unbalanced quotes and # where unbalanced_dquotes(df1) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Exceptional slowness with read.csv
Às 19:42 de 08/04/2024, Ivan Krylov via R-help escreveu: В Sun, 7 Apr 2024 23:47:52 -0600 Dave Dixon пишет: > second_records <- read.csv(file_name, skip = 2459465, nrows = 5) It may or may not be important that read.csv defaults to header = TRUE. Having skipped 2459465 lines, it may attempt to parse the next one as a header, so the second call read.csv() should probably include header = FALSE. This will throw an error, call read.table with sep="," instead. Bert's advice to try scan() is on point, though. It's likely that the default-enabled header is not the most serious problem here. Hoep this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question regarding reservoir volume and water level
Às 13:27 de 07/04/2024, javad bayat escreveu: Dear all; I have a question about the water level of a reservoir, when the volume changed or doubled. There is a DEM file with the highest elevation 1267 m. The lowest elevation is 1230 m. The current volume of the reservoir is 7,000,000 m3 at 1240 m. Now I want to know what would be the water level if the volume rises to 1250 m? or what would be the water level if the volume doubled (14,000,000 m3)? Is there any way to write codes to do this in R? I would be more than happy if anyone could help me. Sincerely Hello, This is a simple rule of three. If you know the level l the argument doesn't need to be named but if you know the volume v then it must be named. water_level <- function(l, v, level = 1240, volume = 7e6) { if(missing(v)) { volume * l / level } else level * v / volume } lev <- 1250 vol <- 14e6 water_level(l = lev) #> [1] 7056452 water_level(v = vol) #> [1] 2480 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Output of tapply function as data frame: Problem Fixed
Às 01:43 de 29/03/2024, Ogbos Okike escreveu: Dear Rui, Thanks again for resolving this. I have already started using the version that works for me. But to clarify the second part, please let me paste the what I did and the error message: set.seed(2024) data <- data.frame( +Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L, + TRUE), +count = sample(10L, 100L, TRUE) + ) # coerce tapply's result to class "data.frame" res <- with(data, tapply(count, Date, mean)) |> as.data.frame() Error: unexpected '>' in "res <- with(data, tapply(count, Date, mean)) |>" # assign a dates column from the row names res$Date <- row.names(res) Error in row.names(res) : object 'res' not found # cosmetics names(res)[2:1] <- names(data) Error in names(res)[2:1] <- names(data) : object 'res' not found # note that the row names are still tapply's names vector # and that the columns order is not Date/count. Both are fixed # after the calculations. res You can see that the error message is on the pipe. Please, let me know where I am missing it. Thanks. On Wed, Mar 27, 2024 at 10:45 PM Rui Barradas wrote: Às 08:58 de 27/03/2024, Ogbos Okike escreveu: Dear Rui, Nice to hear from you! I am sorry for the omission and I have taken note. Many thanks for responding. The second solution looks elegant as it quickly resolved the problem. Please, take a second look at the first solution. It refused to run. Looks as if the pipe is not properly positioned. Efforts to correct it and get it run failed. If you can look further, it would be great. If time does not permit, I am fine too. But having the too solutions will certainly make the subject more interesting. Thank you so much. With warmest regards from Ogbos On Wed, Mar 27, 2024 at 8:44 AM Rui Barradas wrote: Às 04:30 de 27/03/2024, Ogbos Okike escreveu: Warm greetings to you all. Using the tapply function below: data<-read.table("FD1month",col.names = c("Dates","count")) x=data$count f<-factor(data$Dates) AB<- tapply(x,f,mean) I made a simple calculation. The result, stored in AB, is of the form below. But an effort to write AB to a file as a data frame fails. When I use the write table, it only produces the count column and strip of the first column (date). 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01 2006-05-01 -4.106887 -4.259154 -5.836090 -4.756757 -4.118011 -4.487942 -4.430705 2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01 -3.856727 -6.067103 -6.418767 -4.383031 -3.985805 -4.768196 -10.072579 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01 2007-07-01 -5.342338 -4.653128 -4.325094 -4.525373 -4.574783 -3.915600 -4.127980 2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01 2008-02-01 -3.952150 -4.033518 -4.532878 -4.522941 -4.485693 -3.922155 -4.183578 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01 2008-09-01 -4.336969 -3.813306 -4.296579 -4.575095 -4.036036 -4.727994 -4.347428 2008-10-01 2008-11-01 2008-12-01 -4.029918 -4.260326 -4.454224 But the normal format I wish to display only appears on the terminal, leading me to copy it and paste into a text file. That is, when I enter AB on the terminal, it returns a format in the form: 008-02-01 -4.183578 2008-03-01 -4.336969 2008-04-01 -3.813306 2008-05-01 -4.296579 2008-06-01 -4.575095 2008-07-01 -4.036036 2008-08-01 -4.727994 2008-09-01 -4.347428 2008-10-01 -4.029918 2008-11-01 -4.260326 2008-12-01 -4.454224 Now, my question: How do I write out two columns displayed by AB on the terminal to a file? I have tried using AB<-data.frame(AB) but it doesn't work either. Many thanks for your time. Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The main trick is to pipe to as.data.frame. But the result will have one column only, you must assign the dates from the df's row names. I also include an aggregate solution. # create a test data set set.seed(2024) data <- data.frame( Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L, TRUE), count = sample(10L, 100L, TRUE) ) # coerce tapply's result to class "data.frame" res <- with(data, tapply(count, Date, mean)) |> as.data.frame() # assign a dates column from the row names res$Date <- row.names(res) # cosmetics names(res)[2:1] <- names(data) # note that the row names are still tapply's names vector # and that the columns
Re: [R] Output of tapply function as data frame: Problem Fixed
Às 08:58 de 27/03/2024, Ogbos Okike escreveu: Dear Rui, Nice to hear from you! I am sorry for the omission and I have taken note. Many thanks for responding. The second solution looks elegant as it quickly resolved the problem. Please, take a second look at the first solution. It refused to run. Looks as if the pipe is not properly positioned. Efforts to correct it and get it run failed. If you can look further, it would be great. If time does not permit, I am fine too. But having the too solutions will certainly make the subject more interesting. Thank you so much. With warmest regards from Ogbos On Wed, Mar 27, 2024 at 8:44 AM Rui Barradas wrote: Às 04:30 de 27/03/2024, Ogbos Okike escreveu: Warm greetings to you all. Using the tapply function below: data<-read.table("FD1month",col.names = c("Dates","count")) x=data$count f<-factor(data$Dates) AB<- tapply(x,f,mean) I made a simple calculation. The result, stored in AB, is of the form below. But an effort to write AB to a file as a data frame fails. When I use the write table, it only produces the count column and strip of the first column (date). 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01 2006-05-01 -4.106887 -4.259154 -5.836090 -4.756757 -4.118011 -4.487942 -4.430705 2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01 -3.856727 -6.067103 -6.418767 -4.383031 -3.985805 -4.768196 -10.072579 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01 2007-07-01 -5.342338 -4.653128 -4.325094 -4.525373 -4.574783 -3.915600 -4.127980 2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01 2008-02-01 -3.952150 -4.033518 -4.532878 -4.522941 -4.485693 -3.922155 -4.183578 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01 2008-09-01 -4.336969 -3.813306 -4.296579 -4.575095 -4.036036 -4.727994 -4.347428 2008-10-01 2008-11-01 2008-12-01 -4.029918 -4.260326 -4.454224 But the normal format I wish to display only appears on the terminal, leading me to copy it and paste into a text file. That is, when I enter AB on the terminal, it returns a format in the form: 008-02-01 -4.183578 2008-03-01 -4.336969 2008-04-01 -3.813306 2008-05-01 -4.296579 2008-06-01 -4.575095 2008-07-01 -4.036036 2008-08-01 -4.727994 2008-09-01 -4.347428 2008-10-01 -4.029918 2008-11-01 -4.260326 2008-12-01 -4.454224 Now, my question: How do I write out two columns displayed by AB on the terminal to a file? I have tried using AB<-data.frame(AB) but it doesn't work either. Many thanks for your time. Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The main trick is to pipe to as.data.frame. But the result will have one column only, you must assign the dates from the df's row names. I also include an aggregate solution. # create a test data set set.seed(2024) data <- data.frame( Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L, TRUE), count = sample(10L, 100L, TRUE) ) # coerce tapply's result to class "data.frame" res <- with(data, tapply(count, Date, mean)) |> as.data.frame() # assign a dates column from the row names res$Date <- row.names(res) # cosmetics names(res)[2:1] <- names(data) # note that the row names are still tapply's names vector # and that the columns order is not Date/count. Both are fixed # after the calculations. res #> count Date #> 2024-03-22 5.416667 2024-03-22 #> 2024-03-23 5.50 2024-03-23 #> 2024-03-24 6.00 2024-03-24 #> 2024-03-25 4.476190 2024-03-25 #> 2024-03-26 6.538462 2024-03-26 #> 2024-03-27 5.20 2024-03-27 # fix the columns' order res <- res[2:1] # better all in one instruction aggregate(count ~ Date, data, mean) #> Datecount #> 1 2024-03-22 5.416667 #> 2 2024-03-23 5.50 #> 3 2024-03-24 6.00 #> 4 2024-03-25 4.476190 #> 5 2024-03-26 6.538462 #> 6 2024-03-27 5.20 Also, I'm glad to help as always but Ogbos, you have been an R-Help contributor for quite a while, please post data in dput format. Given the problem the output of the following is more than enough. dput(head(data, 20L)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com Hello, This pipe? with(data, tapply(count, Date, mean)) |> as.data.frame() I am not seeing anything wrong with it. I have tried it again just now and it runs with no problems, like it had before. A solution is not to pipe
Re: [R] Output of tapply function as data frame
Às 04:30 de 27/03/2024, Ogbos Okike escreveu: Warm greetings to you all. Using the tapply function below: data<-read.table("FD1month",col.names = c("Dates","count")) x=data$count f<-factor(data$Dates) AB<- tapply(x,f,mean) I made a simple calculation. The result, stored in AB, is of the form below. But an effort to write AB to a file as a data frame fails. When I use the write table, it only produces the count column and strip of the first column (date). 2005-11-01 2005-12-01 2006-01-01 2006-02-01 2006-03-01 2006-04-01 2006-05-01 -4.106887 -4.259154 -5.836090 -4.756757 -4.118011 -4.487942 -4.430705 2006-06-01 2006-07-01 2006-08-01 2006-09-01 2006-10-01 2006-11-01 2006-12-01 -3.856727 -6.067103 -6.418767 -4.383031 -3.985805 -4.768196 -10.072579 2007-01-01 2007-02-01 2007-03-01 2007-04-01 2007-05-01 2007-06-01 2007-07-01 -5.342338 -4.653128 -4.325094 -4.525373 -4.574783 -3.915600 -4.127980 2007-08-01 2007-09-01 2007-10-01 2007-11-01 2007-12-01 2008-01-01 2008-02-01 -3.952150 -4.033518 -4.532878 -4.522941 -4.485693 -3.922155 -4.183578 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01 2008-09-01 -4.336969 -3.813306 -4.296579 -4.575095 -4.036036 -4.727994 -4.347428 2008-10-01 2008-11-01 2008-12-01 -4.029918 -4.260326 -4.454224 But the normal format I wish to display only appears on the terminal, leading me to copy it and paste into a text file. That is, when I enter AB on the terminal, it returns a format in the form: 008-02-01 -4.183578 2008-03-01 -4.336969 2008-04-01 -3.813306 2008-05-01 -4.296579 2008-06-01 -4.575095 2008-07-01 -4.036036 2008-08-01 -4.727994 2008-09-01 -4.347428 2008-10-01 -4.029918 2008-11-01 -4.260326 2008-12-01 -4.454224 Now, my question: How do I write out two columns displayed by AB on the terminal to a file? I have tried using AB<-data.frame(AB) but it doesn't work either. Many thanks for your time. Ogbos [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The main trick is to pipe to as.data.frame. But the result will have one column only, you must assign the dates from the df's row names. I also include an aggregate solution. # create a test data set set.seed(2024) data <- data.frame( Date = sample(seq(Sys.Date() - 5, Sys.Date(), by = "1 days"), 100L, TRUE), count = sample(10L, 100L, TRUE) ) # coerce tapply's result to class "data.frame" res <- with(data, tapply(count, Date, mean)) |> as.data.frame() # assign a dates column from the row names res$Date <- row.names(res) # cosmetics names(res)[2:1] <- names(data) # note that the row names are still tapply's names vector # and that the columns order is not Date/count. Both are fixed # after the calculations. res #> count Date #> 2024-03-22 5.416667 2024-03-22 #> 2024-03-23 5.50 2024-03-23 #> 2024-03-24 6.00 2024-03-24 #> 2024-03-25 4.476190 2024-03-25 #> 2024-03-26 6.538462 2024-03-26 #> 2024-03-27 5.20 2024-03-27 # fix the columns' order res <- res[2:1] # better all in one instruction aggregate(count ~ Date, data, mean) #> Datecount #> 1 2024-03-22 5.416667 #> 2 2024-03-23 5.50 #> 3 2024-03-24 6.00 #> 4 2024-03-25 4.476190 #> 5 2024-03-26 6.538462 #> 6 2024-03-27 5.20 Also, I'm glad to help as always but Ogbos, you have been an R-Help contributor for quite a while, please post data in dput format. Given the problem the output of the following is more than enough. dput(head(data, 20L)) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with R coding
Às 07:43 de 12/03/2024, Maria Del Mar García Zamora escreveu: Hello, This is the error that appears when I try to load library(Rcmdr). I am using R version 4.3.3. I have tried to upload the packages, uninstall them and intalling them again and nothing. Loading required package: splines Loading required package: RcmdrMisc Loading required package: car Loading required package: carData Loading required package: sandwich Loading required package: effects lattice theme set by effectsTheme() See ?effectsTheme for details. Error: package or namespace load failed for ‘Rcmdr’: .onLoad failed in loadNamespace() for 'tcltk2', details: call: file.exists("~/.Rtk2theme") error: file name conversion problem -- name too long? Once this appears I use path.expand('~') and this is R's answer: [1] "C:\\Users\\marga\\OneDrive - Fundaci\xf3n Universitaria San Pablo CEU\\Documentos" The thing is that in spanish we use accents, so this word (Fundaci\xf3n) really is Fundación, but I can't change it. I have tried to start R from CDM using: C:\Users\marga>set R_USER=C:\Users\marga\R_USER C:\Users\marga>"C:\Users\marga\Desktop\R-4.3.3\bin\R.exe" CMD Rgui At the beginning this worked but right now a message saying that this app cannot be used and that I have to ask the software company (photo attached) What should I do? Thanks, Mar [https://www.uchceu.es/img/externos/correo/ceu_uch.gif]<https://www.uchceu.es/> Maria Del Mar García Zamora Alumno UCHCEU - Universidad CEU Cardenal Herrera - Tel. www.uchceu.es<https://www.uchceu.es/> [https://www.uchceu.es/img/logos/wur.jpg] [https://www.uchceu.es/img/externos/correo/medio_ambiente.gif] Por favor, piensa en el medio ambiente antes de imprimir este contenido [http://www.uchceu.es/img/externos/correo/ceu_uch.gif]<http://www.uchceu.es/> Maria Del Mar García Zamora www.uchceu.es<http://www.uchceu.es/> [http://www.uchceu.es/img/externos/correo/medio_ambiente.gif] Por favor, piensa en el medio ambiente antes de imprimir este contenido Este mensaje y sus archivos adjuntos, enviados desde FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, pueden contener información confidencial y está destinado a ser leído sólo por la persona a la que va dirigido, por lo que queda prohibida la difusión, copia o utilización de dicha información por terceros. Si usted lo recibiera por error, por favor, notifíquelo al remitente y destruya el mensaje y cualquier documento adjunto que pudiera contener. Cualquier información, opinión, conclusión, recomendación, etc. contenida en el presente mensaje no relacionada con la actividad de FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, y/o emitida por persona no autorizada para ello, deberá considerarse como no proporcionada ni aprobada por FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, que pone los medios a su alcance para garantizar la seguridad y ausencia de errores en la correspondencia electrónica, pero no puede asegurar la inexistencia de virus o la no alteración de los documentos transmitidos electrónicamente, por lo que declina cualquier responsabilidad a este respecto. This message and its attachments, sent from FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU, may contain confidential information and is intended to be read only by the person it is directed. Therefore any disclosure, copying or use by third parties of this information is prohibited. If you receive this in error, please notify the sender and destroy the message and any attachments may contain. Any information, opinion, conclusion, recommendation,... contained in this message and which is unrelated to the business activity of FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU and/or issued by unauthorized personnel, shall be considered unapproved by FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU. FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU implements control measures to ensure, as far as possible, the security and reliability of all its electronic correspondence. However, FUNDACIÓN UNIVERSITARIA SAN PABLO-CEU does not guarantee that emails are virus-free or that documents have not be altered, and does not take responsibility in this respect. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, First of all, try running Rgui only, no R.exe CMD. Just Rgui.exe or C:\Users\marga\Desktop\R-4.3.3\bin\Rgui.exe Then, in Rgui, try loading Rcmdr library(Rcmdr) Also, do you have R in your Windows PATH variable? The directory to put in PATH should be C:\Users\marga\Desktop\R-4.3.3\bin so that Windows can find R.exe and Rgui.exe without the full path name. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar
Re: [R] help - Package: stats - function ar.ols
Às 16:34 de 22/02/2024, Pedro Gavronski. escreveu: Hello, My name is Pedro and it is nice to meet you all. I am having trouble understanding a message that I receive when use function ar.ols from package stats, it says that "Warning message: In ar.ols(x = dtb[2:6966, ], demean = FALSE, intercept = TRUE, prewhite = TRUE) : model order: 2 singularities in the computation of the projection matrix results are only valid up to model order 1, which I do not know what it means, if someone could clarify it, I would really appreciate it. Attached to this email you will find my code and data I used to run this formula. Thanks in advance. Best regards, Pedro. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Thanks for the data but the code is missing from the attachment. Can you please post your code? In an attachment or directly in the e-mail body. Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping
Às 03:27 de 19/02/2024, Steven Yen escreveu: I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 24 altogether. That is, data<-read.csv(“data1.csv”) … data<-read.csv(“data24.csv”) … Is there a way to do this in a loop? Thank you. Steven from iPhone [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here is a way of reading the files in a *apply loop. The file names are created by getting them from file (list.files) or by a string editing function (sprintf). # file_names_vec <- list.files(pattern = "data\\d+\\.csv") file_names_vec <- sprintf("data%d.csv", 1:24) data_list <- sapply(file_names_vec, read.csv, simplify = FALSE) # access the 1st data.frame data_list[[1L]] # same as above data_list[["data1.csv"]] # same as above data_list$data1.csv Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Packages sometimes don't update, but no error or warning is thrown
Às 10:50 de 14/02/2024, Martin Maechler escreveu: Berwin A Turlach on Wed, 14 Feb 2024 11:47:41 +0800 writes: Berwin A Turlach on Wed, 14 Feb 2024 11:47:41 +0800 writes: > G'day Philipp, > On Tue, 13 Feb 2024 09:59:17 +0100 gernophil--- via R-help > wrote: >> this question is related to this >> (https://community.rstudio.com/t/packages-are-not-updating/166214/3), >> [...] >> To sum it up: If I am updating packages (be it via >> Bioconductor or CRAN) some packages simply don’t update, >> [...] >> I would expect any kind of message that the package will >> not be updated, since no newer binary is available or a >> prompt, if I want to compile from source. > RStudio is doing its own thing for some task, including > 'install.packages()' (and for some reasons, at least on > the platforms on which I use RStudio, RStudio calls > 'install.packages()' and not 'update.packages()' when an > update is requested via the GUI). See: RStudio> install.packages > function (...) .rs.callAs(name, hook, original, ...) > > compared to: R> install.packages > function (pkgs, lib, repos = getOption("repos"), > contriburl = contrib.url(repos, type), method, available = > NULL, destdir = NULL, dependencies = NA, type = > getOption("pkgType"), configure.args = > getOption("configure.args"), configure.vars = > getOption("configure.vars"), clean = FALSE, Ncpus = > getOption("Ncpus", 1L), verbose = getOption("verbose"), > libs_only = FALSE, INSTALL_opts, quiet = FALSE, > keep_outputs = FALSE, ...) { [...] > So if you use Install/Update in the Packages tab of > RStudio and do not experience the behaviour you are > expecting, it is something that you need to discuss with > Posit, not with R. :) >> However, the only message I get is: ``` trying URL >> '' > The package name has the version number encoded in it, so > theoretical you should be able to tell at this point > whether the package that is downloaded is the version that > is already installed, hence no update will happen. > Best wishes, > Berwin Yes, thank's a lot, Berwin. Indeed I've raised the fact that RStudio hides R's own install.packages() from the user and uses its own, undocumented one ... this has been the case for quite a few years. I found out during teaching --- one of the few times, I use RStudio to use R... in another case where RStudio's install.packages() behaved differently than R's. I'm pretty sure this is reason for quite a bit of confusion... Martin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, From within RStudio you can always run the qualified names utils::install.packages() utils::update.packages() or run from the command line. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Packages sometimes don't update, but no error or warning is thrown
commented, minimal, self-contained, reproducible code. Hello, Not exactly an answer, just a thought: Whenever I have problems updating or installing packages from whithin RStudio I close RStudio, write a script with the install.packages() call and run it from a command window. R -q -f "instscript.R" This many times works better and it also works with Bioconductor's BiocManager::install or with remotes'/devtools's install_github. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gathering denominator under frac
Às 10:01 de 02/02/2024, Troels Ring escreveu: Hi friends - I'm plotting a ratio of bicarbonates i ggplot2 and ylab(expression(paste(frac("additive BIC","true BIC" worked OK - but now I have been asked to put the chemistry instead - so I wrote ylab(expression(paste(frac("additive",HCO[3]^"-","true",HCO[3]^"-" - and frac saw that as additive = numerator and HCO3- = denominator and the rest was ignored- So how do I make frac ignore the first "," and print the fraction as I want? All best wishes Troels __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, This seems to work. Instead of separating the two numerator strings with a comma, separate them with a tilde. The same goes for the denominator. And there is no need for double quotes around "additive" and "true". library(ggplot2) g <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() g + ylab(expression(paste(frac( additive~HCO[3]^"-", true~HCO[3]^"-" Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help testing a problem
grDevices utils datasets methods base other attached packages: [1] rerddap_1.1.0 loaded via a namespace (and not attached): [1] vctrs_0.6.3 cli_3.6.1 rlang_1.1.1 ncdf4_1.22 [5] crul_1.4.0generics_0.1.3jsonlite_1.8.7 data.table_1.14.8 [9] glue_1.6.2httpcode_0.3.0triebeard_0.4.1 fansi_1.0.5 [13] rappdirs_0.3.3tibble_3.2.1 hoardr_0.5.4 lifecycle_1.0.4 [17] compiler_4.3.2dplyr_1.1.3 Rcpp_1.0.12 pkgconfig_2.0.3 [21] digest_0.6.33 R6_2.5.1 tidyselect_1.2.0 utf8_1.2.4 [25] pillar_1.9.0 curl_5.2.0magrittr_2.0.3urltools_1.7.3 [29] xml2_1.3.5 > So there was an unspecified error, an error without a condition message and no call expression. I find this stranger, a call like the following is expected. tryCatch(stop("error"), error = function(e) e) |> str() List of 2 $ message: chr "error" $ call : language doTryCatch(return(expr), name, parentenv, handler) - attr(*, "class")= chr [1:3] "simpleError" "error" "condition" Function tabledap doesn't seem to be handling errors properly. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot 3-dimensions
Às 09:13 de 17/12/2023, SIBYLLE STÖCKLI via R-help escreveu: Dear R community In the meantime I made some progress: ggplot(data = Fig2b, aes(x = BFF, y = Wert, fill = Effekt))+theme_bw()+ geom_bar(stat = "identity", width = 0.95) + scale_y_continuous(limits=c(0,13), expand=c(0,0))+ facet_wrap(~Aspekt, strip.position = "bottom", scales = "free_x") + theme(panel.spacing = unit(0, "lines"), strip.background = element_blank(), strip.placement = "outside")+ theme(axis.title.x=element_blank())+ scale_fill_manual("Effekt", values = c("Neg" = "red", "Neu" = "darkgrey", "Pos" = "blue"), labels=c("Negativ", "Nicht sign.", "Positiv")) Question - Is it possible to present all the subpolots in one graph (not to "lines")? - I tried to change the angel of the x-axis. However, I was able to change the first x-axis (BB...), but not the second one (Voegel). Maybe this would solve the problem. - If not, is there another possibility to fix the number of subplots per line? Kind regards Sibylle -Original Message- From: R-help On Behalf Of SIBYLLE STÖCKLI via R-help Sent: Saturday, December 16, 2023 12:16 PM To: R-help@r-project.org Subject: [R] ggplot 3-dimensions Dear R-user Does anybody now, if ggplot allows to use two x-axis including two dimensions (similar to excel plot (picture 1 in the pdf attachmet). If yes, how should I adapt my code? The parameters are presented in the input file (attachment: Input). Fig2b = read.delim("BFF_Fig-2b.txt", na.strings="NA") names(Fig2b) head(Fig2b) summary(Fig2b) str(Fig2b) Fig2b$Aspekt<-factor(Fig2b$Aspekt, levels=(c("Voegel", "Kleinsaeuger", "Schnecken", "Regenwuermer_Asseln", "Pilze"))) ### Figure 2b ggplot(Fig2b,aes(Aspekt,Wert,fill=Effekt))+ geom_bar(stat="identity",position='fill')+ scale_y_continuous(limits=c(0,14), expand=c(0,0))+ labs(x="", y="Anzahl Studien pro Effekt") Kind regards Sibylle __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, You are posting the data as image once again, please don't do this. Paste the output of dput(Fig2b)# if small data dput(head(Fig2b, 20)) # if too big to fit in an e-mail in your mails. Here it is. Aspekt <- c("Flora", "Flora", "Flora", "Tagfalter", "Tagfalter", "Tagfalter", "Heuschre", "Heuschre", "Heuschre", "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", "Schwebfli", "Schwebfli", "Schwebfli", "Bienen_F", "Bienen_F", "Bienen_F") Aspekt <- c(Aspekt, Aspekt) BFF <- rep(c("BB", "SA", "NE"), times = 12) Effekt <- c(rep("Neg", times = 18), rep("Pos", times = 18)) Wert <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 2, 1, 0, 0, 1, 0, 9, 4, 6, 0, 0, 3, 0, 0, 4) Fig2b <- data.frame(Aspekt, BFF, Effekt, Wert) As for the question, you can use facet_wrap argument nrow to have all plots in one row only, see the comment before facet_wrap. I don't know if this solves the problem. Also, I define a custom theme to make the code clearer later. library(ggplot2) theme_sibylle <- function() { theme_bw(base_size = 10) %+replace% theme( panel.spacing = unit(0, "lines"), strip.background = element_blank(), strip.placement = "outside", # this line was added by me, remove if not wanted strip.text.x.bottom = element_text(face = "bold", size = 10), axis.title.x = element_blank() ) } ggplot(data = Fig2b, aes(x = BFF, y = Wert, fill = Effekt)) + geom_bar(stat = "identity", width = 0.95) + scale_y_continuous(limits=c(0,13), expand=c(0,0)) + # here I use nrow = 1L to put everything in one row only facet_wrap(~ Aspekt, nrow = 1L, strip.position = "bottom", scales = "free_x") + scale_fill_manual( name = "Effekt", values = c("Neg" = "red", "Neu" = "darkgrey", "Pos" = "blue"), labels = c("Negativ", "Nicht sign.", "Positiv")) + theme_sibylle() Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: Get the regression line with 95% confidence bands
Às 00:36 de 13/12/2023, Robert Baer escreveu: coord_cartesian also seems to work for y, and including the breaks = . How about: df=data.frame(year= c(2012,2015,2018,2022), score=c(495,493, 495, 474)) ggplot(df, aes(x = year, y = score)) + geom_point() + geom_smooth(method = "lm", formula = y ~ x) + labs(title = "Standard linear regression for France", x = "Year", y = "PISA score in mathematics") + coord_cartesian(ylim=c(470,500)) + scale_x_continuous(breaks = 2012:2022) On 12/12/2023 3:19 PM, varin sacha via R-help wrote: Dear Ben, Dear Daniel, Dear Rui, Dear Bert, Here below my R code. I really appreciate all your comments. My R code is perfectly working but there is still something I would like to improve. The X-axis is showing 2012.5 ; 2015.0 ; 2017.5 ; 2020.0 I would like to see on X-axis only the year (2012 ; 2015 ; 2017 ; 2020). How to do? # library(ggplot2) df=data.frame(year= c(2012,2015,2018,2022), score=c(495,493, 495, 474)) ggplot(df, aes(x = year, y = score)) + geom_point() + geom_smooth(method = "lm", formula = y ~ x) + labs(title = "Standard linear regression for France", x = "Year", y = "PISA score in mathematics") + scale_y_continuous(limits=c(470,500),oob=scales::squish) # Le lundi 11 décembre 2023 à 23:38:06 UTC+1, Ben Bolker a écrit : On 2023-12-11 5:27 p.m., Daniel Nordlund wrote: On 12/10/2023 2:50 PM, Rui Barradas wrote: Às 22:35 de 10/12/2023, varin sacha via R-help escreveu: Dear R-experts, Here below my R code, as my X-axis is "year", I must be missing one or more steps! I am trying to get the regression line with the 95% confidence bands around the regression line. Any help would be appreciated. Best, S. # library(ggplot2) df=data.frame(year=factor(c("2012","2015","2018","2022")), score=c(495,493, 495, 474)) ggplot(df, aes(x=year, y=score)) + geom_point( ) + geom_smooth(method="lm", formula = score ~ factor(year), data = df) + labs(title="Standard linear regression for France", y="PISA score in mathematics") + ylim(470, 500) # __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I don't see a reason why year should be a factor and the formula in geom_smooth is wrong, it should be y ~ x, the aesthetics envolved. It still doesn't plot the CI's though. There's a warning and I am not understanding where it comes from. But the regression line is plotted. ggplot(df, aes(x = as.numeric(year), y = score)) + geom_point() + geom_smooth(method = "lm", formula = y ~ x) + labs( title = "Standard linear regression for France", x = "Year", y = "PISA score in mathematics" ) + ylim(470, 500) #> Warning message: #> In max(ids, na.rm = TRUE) : no non-missing arguments to max; returning -Inf Hope this helps, Rui Barradas After playing with this for a little while, I realized that the problem with plotting the confidence limits is the addition of ylim(470, 500). The confidence values are outside the ylim values. Remove the limits, or increase the range, and the confidence curves will plot. Hope this is helpful, Dan Or use + scale_y_continuous(limits = c(470, 500), oob = scales::squish) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, In the code below I don't use coord_cartesian because to set ylim will cut part of the confidence intervals. To have labels only in the years present in the data set, get them from the data. library(ggplot2) df <- data.frame(year= c(2012,2015,2018,2022),
Re: [R] ggplot2: Get the regression line with 95% confidence bands
Às 22:35 de 10/12/2023, varin sacha via R-help escreveu: Dear R-experts, Here below my R code, as my X-axis is "year", I must be missing one or more steps! I am trying to get the regression line with the 95% confidence bands around the regression line. Any help would be appreciated. Best, S. # library(ggplot2) df=data.frame(year=factor(c("2012","2015","2018","2022")), score=c(495,493, 495, 474)) ggplot(df, aes(x=year, y=score)) + geom_point( ) + geom_smooth(method="lm", formula = score ~ factor(year), data = df) + labs(title="Standard linear regression for France", y="PISA score in mathematics") + ylim(470, 500) # __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I don't see a reason why year should be a factor and the formula in geom_smooth is wrong, it should be y ~ x, the aesthetics envolved. It still doesn't plot the CI's though. There's a warning and I am not understanding where it comes from. But the regression line is plotted. ggplot(df, aes(x = as.numeric(year), y = score)) + geom_point() + geom_smooth(method = "lm", formula = y ~ x) + labs( title = "Standard linear regression for France", x = "Year", y = "PISA score in mathematics" ) + ylim(470, 500) #> Warning message: #> In max(ids, na.rm = TRUE) : no non-missing arguments to max; returning -Inf Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert character date time to R date-time variable.
Às 16:30 de 07/12/2023, Rui Barradas escreveu: Às 16:21 de 07/12/2023, Sorkin, John escreveu: Colleagues, I have a matrix of character data that represents date and time. The format of each element of the matrix is "2020-09-17_00:00:00" How can I convert the elements into a valid R date-time constant? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Coerce with ?as.POSIXct Don't forget the underscore in the format. as.POSIXct("2020-09-17_00:00:00", format = "%Y-%m-%d_%H:%M:%S") Hope this helps, Rui Barradas Sorry, I forgot: lubridate::ymd_hms("2020-09-17_00:00:00") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert character date time to R date-time variable.
Às 16:21 de 07/12/2023, Sorkin, John escreveu: Colleagues, I have a matrix of character data that represents date and time. The format of each element of the matrix is "2020-09-17_00:00:00" How can I convert the elements into a valid R date-time constant? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Coerce with ?as.POSIXct Don't forget the underscore in the format. as.POSIXct("2020-09-17_00:00:00", format = "%Y-%m-%d_%H:%M:%S") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mann Kendall mutation package?
Às 11:58 de 01/12/2023, Nick Wray escreveu: Hello - does anyone know whether there are any packages for Mann-Kendall mutation tests in R available? The only one I could find online is this MK_mut_test: Mann-Kendall mutation test in Sibada/sibadaR: Sibada's accumulated R scripts for next probably use to avoid reinventing the wheel. (rdrr.io) <https://rdrr.io/github/Sibada/sibadaR/man/MK_mut_test.html> but there doesn't seem to be a package corresponding to this. I've tried installing various permutations of the apparent name Sibada/sibadaR but nothing comes up, so I'm not sure whether it even exists... Thanks Nick Wray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Your link points to a GitHub repository, the package can be installed with devtools::install_github(repo = "Sibada/sibadaR") Hope this helps Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] back tick names with predict function
Às 17:57 de 30/11/2023, Rui Barradas escreveu: Às 17:38 de 30/11/2023, Robert Baer escreveu: I am having trouble using back ticks with the R extractor function 'predict' and an lm() model. I'm trying too construct some nice vectors that can be used for plotting the two types of regression intervals. I think it works with normal column heading names but it fails when I have "special" back-tick names. Can anyone help with how I would reference these? Short of renaming my columns, is there a way to accomplish this? Repex *# dataframe with dashes in column headings cob = structure(list(`cob-wt` = c(212, 241, 215, 225, 250, 241, 237, 282, 206, 246, 194, 241, 196, 193, 224, 257, 200, 190, 208, 224 ), `plant-density` = c(137, 107, 132, 135, 115, 103, 102, 65, 149, 85, 173, 124, 157, 184, 112, 80, 165, 160, 157, 119)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L)) # regression model works mod2 = lm(`cob-wt` ~ `plant-density`, data = cob) # x sequence for plotting CI's # Set up x points x = seq(min(cob$`plant-density`), max(cob$`plant-density`), length = 1000) # Use predict to get CIs for a plot # Add CI for regression line (y-hat uses 'c') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.c = predict(mod2, data.frame( `plant-density` = x), interval = 'c') # fail # Add CI for prediction value (y-tilde uses 'p') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.p = predict(mod2, data.frame(`plant-density` = x), interval = 'p') # fail * __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, When creating the new data df, the default check.names = TRUE changes the column name, it is repaired and the hyphen is replaced by a legal dot. # check.names defaults to TRUE newd <- data.frame(`plant-density` = x) # `plant-density` is not a column name head(newd) # check.names set to FALSE newd <- data.frame(`plant-density` = x, check.names = FALSE) # `plant-density` is becomes a column name head(newd) # Use predict to get CIs for a plot # Add CI for regression line (y-hat uses 'c') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.c = predict(mod2, newdata = newd, interval = 'confidence') # fail # Add CI for prediction value (y-tilde uses 'p') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.p = predict(mod2, newdata = newd, interval = 'prediction') # fail Hope this helps, Rui Barradas Hello, Sorry for the comments '# fail' in the last two instructions, I should have changed them. CI.c <- predict(mod2, newdata = newd, interval = 'confidence') # works CI.p <- predict(mod2, newdata = newd, interval = 'prediction') # works Hoep this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] back tick names with predict function
Às 17:38 de 30/11/2023, Robert Baer escreveu: I am having trouble using back ticks with the R extractor function 'predict' and an lm() model. I'm trying too construct some nice vectors that can be used for plotting the two types of regression intervals. I think it works with normal column heading names but it fails when I have "special" back-tick names. Can anyone help with how I would reference these? Short of renaming my columns, is there a way to accomplish this? Repex *# dataframe with dashes in column headings cob = structure(list(`cob-wt` = c(212, 241, 215, 225, 250, 241, 237, 282, 206, 246, 194, 241, 196, 193, 224, 257, 200, 190, 208, 224 ), `plant-density` = c(137, 107, 132, 135, 115, 103, 102, 65, 149, 85, 173, 124, 157, 184, 112, 80, 165, 160, 157, 119)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L)) # regression model works mod2 = lm(`cob-wt` ~ `plant-density`, data = cob) # x sequence for plotting CI's # Set up x points x = seq(min(cob$`plant-density`), max(cob$`plant-density`), length = 1000) # Use predict to get CIs for a plot # Add CI for regression line (y-hat uses 'c') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.c = predict(mod2, data.frame( `plant-density` = x), interval = 'c') # fail # Add CI for prediction value (y-tilde uses 'p') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.p = predict(mod2, data.frame(`plant-density` = x), interval = 'p') # fail * __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, When creating the new data df, the default check.names = TRUE changes the column name, it is repaired and the hyphen is replaced by a legal dot. # check.names defaults to TRUE newd <- data.frame(`plant-density` = x) # `plant-density` is not a column name head(newd) # check.names set to FALSE newd <- data.frame(`plant-density` = x, check.names = FALSE) # `plant-density` is becomes a column name head(newd) # Use predict to get CIs for a plot # Add CI for regression line (y-hat uses 'c') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.c = predict(mod2, newdata = newd, interval = 'confidence') # fail # Add CI for prediction value (y-tilde uses 'p') # usual trick is to assign x to actual x-var name in middle dataframe arguement CI.p = predict(mod2, newdata = newd, interval = 'prediction')# fail Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot with two x-axis and two dimensions
Às 10:29 de 24/11/2023, sibylle.stoec...@gmx.ch escreveu: Dear R-user Does anybody now, if ggplot allows to use two x-axis including two dimensions (similar to excel plot (picture 1 in the pdf attachmet). If yes, how should I adapt my code? The parameters are presented in the input file (attachment: Input). Fig2b = read.delim("BFF_Fig-2b.txt", na.strings="NA") names(Fig2b) head(Fig2b) summary(Fig2b) str(Fig2b) Fig2b$Aspekt<-factor(Fig2b$Aspekt, levels=(c("Voegel", "Kleinsaeuger", "Schnecken", "Regenwuermer_Asseln", "Pilze"))) ### Figure 2b ggplot(Fig2b,aes(Aspekt,Wert,fill=Effekt))+ geom_bar(stat="identity",position='fill')+ scale_y_continuous(limits=c(0,14), expand=c(0,0))+ labs(x="", y="Anzahl Studien pro Effekt") Kind regards Sibylle __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The first attached file does not match the data in the second file but here is an answer to both this question and to your other question [1]. The trick to have a secondary axis is to compute a ratio of axis lenghts. The lengths of the main and secondary axis can be computed by functions range() and diff(), like in the code below. Then use it to scale the secondary axis. Fig2b <- structure(list( Aspekt = c("Flora", "Flora", "Flora", "Tagfalter", "Tagfalter", "Tagfalter", "Heuschre", "Heuschre", "Heuschre", "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", "Schwebfli", "Schwebfli", "Schwebfli", "Bienen_F", "Bienen_F", "Bienen_F", "Flora", "Flora", "Flora", "Tagfalter", "Tagfalter", "Tagfalter", "Heuschre", "Heuschre", "Heuschre", "Kaefer_Sp", "Kaefer_Sp", "Kaefer_Sp", "Schwebfli", "Schwebfli", "Schwebfli", "Bienen_F", "Bienen_F", "Bienen_F"), BFF = c("BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE", "BB", "SA", "NE"), Effekt = c("Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Neu", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos"), Wert = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 3L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 1L, 0L, 9L, 4L, 6L, 0L, 0L, 3L, 0L, 0L, 4L)), row.names = c(NA, -36L), class = "data.frame") library(ggplot2) # First y axis (0-9) # Second y axis (0-2500) # fac <- diff(range( sec axis ))/diff(range( 1st axis )) fac <- diff(range(0, 2500))/diff(range(0, 9)) ggplot(Fig2b, aes(Aspekt, Wert, fill = Effekt)) + geom_col(position = position_dodge()) + scale_y_continuous( breaks = seq(0, 12, 2L), sec.axis = sec_axis(~ . * fac) ) + labs(x = "", y = "Anzahl Studien pro Effekt") [1] https://stat.ethz.ch/pipermail/r-help/2023-November/478605.html Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to draw mean values and 95% confidence intervals of groups with ggplot2
Às 11:59 de 16/11/2023, Luigi Marongiu escreveu: Hello, I have triplicate (column A) readings (column D) of samples exposed to different concentrations (column C) over time (column B). Is it possible to draw a line plot of the mean values for each concentration (C)? At the moment, I get a single line. Also, is there a simple way to draw the 95% CI around these data? I know I need to use ribbon with the lower and upper limit, but is there a simple way for ggplot2 to calculate directly these values? Here is a working example: ``` A = c(rep(1, 28), rep(2, 28), rep(3, 28)) B = rep(c(0, 15, 30, 45, 60, 75, 90), 12) C = rep(c(rep(0, 7), rep(0.6, 7), rep(1.2, 7), rep(2.5,7)),3) D = c(731.33,761.67,730,761.67,741.67,788.67,784.33, 686.67,685.33,680,693.67,684,704,709.67,739, 731,719,767,760.67,776.67,768.67,675,671.67, 668.67,677.33,673.67,687,696.67,727,750.67, 752.67,786.67,794.67,843.33,946,732.67,737.33, 775.33,828,918,1063,1270,752.67,742.33, 735.67, 747.67,777.33,803.67,865.67,700,700.67,705.67, 722.67,744,779,837,748,742,754,747.67, 775.67,808.67,869,705.67,714.33,702.33,730, 710.67,731,744,686.33,687.33,670,702.33, 669.33,707.33,708.33,724,747,761.33,715, 697.67,728,728) df = data.frame(A, B, C, D) library(ggplot2) ggplot(data=df, aes(x=B, y=D, z=C, color =C)) + geom_line(stat = "summary", fun = "mean") + geom_ribbon() ``` Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I am not sure that the code below is what you want. The first 3 instructions are to create a named vector of colors. The pipe is what tries to solve the problem. It computes means and se's by groups of time and concentration, then plots the ribbon below the lines. It is important to not set color = C in the initial call to ggplot, since it would be effective in all the subsequent layers (try it). To have one line per concentration I use group = C instead. suppressPackageStartupMessages({ library(ggplot2) library(dplyr) }) n_colors <- df$C |> unique() |> length() names_colors <- df$C |> unique() |> as.character() clrs <- setNames(palette.colors(n_colors), names_colors) df %>% mutate(C = factor(C)) %>% group_by(B, C) %>% mutate(mean_D = mean(D), se_D = sd(D)) %>% ungroup() %>% ggplot(aes(x = B, group = C)) + geom_ribbon(aes(ymin = mean_D - se_D, ymax = mean_D + se_D), fill = "grey", alpha = 0.5) + geom_line(aes(y = mean_D, color = C)) + geom_point(aes(y = D, color = C)) + scale_color_manual(name = "Concentration", values = clrs) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anyone having trouble accesing CRAN?
Às 19:13 de 15/11/2023, Christopher W. Ryan via R-help escreveu: at https://cran.r-project.org/ I get this error message: = Secure Connection Failed An error occurred during a connection to cran.r-project.org. PR_END_OF_FILE_ERROR Error code: PR_END_OF_FILE_ERROR The page you are trying to view cannot be shown because the authenticity of the received data could not be verified. === Three different browsers, two different devices, two different networks. (The text of the error messages varies.) Anyone seeing similar? Thanks. --Chris Ryan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Yes, CRAN is down. I know last week there was an anouncement about a maintenance scheduled but I cannot place that e-mail right now and don't remember the date exactly so I cannot say for sure this is what is happening. But it is probably a scheduled maintenance. Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cryptic error for mscmt function
Às 13:35 de 05/11/2023, Leu Thierry escreveu: Hi everyone, I am trying to conduct a synthetic control analysis using the MSCMT package. However, when trying to run it I get a very cryptic error message saying "Error in lst[[nam]][intersect(tim, rownames(lst[[nam]])), cols, drop = FALSE]: subscript out of bounds". Does anyone know what this means and why I receive this error? I attached the code & dataset used in the attachment. Thanks a lot! Best regards Thierry __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, No attachment came through the filters, can you resend in plain text or if it was a .R file, rename it .txt? See [1], section General Instructions for more on this [1] https://www.r-project.org/mail.html#instructions Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum data according to date in sequence
___ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here are two solutions. 1. Base R Though I don't coerce the date column to class "Date", it seems to work. aggregate(EnergykWh ~ date, dt1, sum) #>date EnergykWh #> 1 1/14/2016 11.98569 #> 2 1/15/2016 32.56938 #> 3 1/16/2016 21.29181 #> 4 1/17/2016 22.88083 #> 5 1/18/2016 9.05750 2. Package dplyr. First column date is coerced from class "character" to class "Date". Then the grouped sums are computed. suppressPackageStartupMessages( library(dplyr) ) dt1 %>% mutate(date = as.Date(date, "%m/%d/%Y")) %>% summarise(EnergykWh = sum(EnergykWh), .by = date) #> date EnergykWh #> 1 2016-01-14 11.98569 #> 2 2016-01-15 32.56938 #> 3 2016-01-16 21.29181 #> 4 2016-01-17 22.88083 #> 5 2016-01-18 9.05750 As you can see, the results are the same. Also, this exact problem is one of the most asked on StackOverflow. Maybe you could try searching there for a solution. My code above is also exactly the code in [1], though I had already this answer written. I only checked after :(. [1] https://stackoverflow.com/questions/61548758/r-how-sum-values-by-group-by-date Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing shapes in legend with scale_shape_manual
Às 20:55 de 30/10/2023, Kevin Zembower via R-help escreveu: Hello, I'm trying to plot a graph of blood glucose versus date. I also record conditions, such as missing the previous night's medications, and missing exercise on the previous day. My data looks like: b2[68:74,] # A tibble: 7 × 5 Date Time bg missed_meds no_exercise 1 2023-10-17 08:50128 TRUEFALSE 2 2023-10-16 06:58144 FALSE FALSE 3 2023-10-15 09:17137 FALSE TRUE 4 2023-10-14 09:04115 FALSE FALSE 5 2023-10-13 08:44136 FALSE TRUE 6 2023-10-12 08:55122 FALSE TRUE 7 2023-10-11 07:55150 TRUETRUE This gets me most of the way to what I want: ggplot(data = b2, aes(x = Date, y = bg)) + geom_line() + geom_point(data = filter(b2, missed_meds), shape = 20, size = 3) + geom_point(data = filter(b2, no_exercise), shape = 4, size = 3) + geom_point(aes(x = Date, y = bg, shape = missed_meds), alpha = 0) + #Invisible point layer for shape mapping scale_y_continuous(name = "Blood glucose (mg/dL)", breaks = seq(100, 230, by = 20) ) + geom_hline(yintercept = 130) + scale_shape_manual(name = "Conditions", labels = c("Missed meds", "Missed exercise"), values = c(20, 4), ## size = 3 ) However, the legend just prints an empty square in front of the labels. What I want is a filled circle (shape 20) in front of "Missed meds" and a filled circle (shape 4) in front of "Missed exercise." My questions are: 1. How can I fix my plot to show the shapes in the legend? 2. Can my overall plotting method be improved? Would you do it this way? Thanks so much for your advice and guidance. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, In ggplot2 graphics when you have more than one call to the same layer function, then you can probably simplify the code. In this case you make several calls to geom_point. This can probably be avoided. Create a new column named Condition. Assign to it the column names wherever the values of those columns are TRUE. The simplest way of doing this is to use colus missed_meds and no_exercise as logical index columns, see code below. Like this the values are mapped to shapes in just one call to geom_point. That's what function aes() is meant for, to tell what variables define what in the plot. b2$Date <- as.Date(b2$Date) # this new column will be mapped to the shape aesthetic b2$Conditions <- NA_character_ b2$Conditions[b2$missed_meds] <- names(b2)[4] b2$Conditions[b2$no_exercise] <- names(b2)[5] ggplot(data = b2, aes(x = Date, y = bg)) + geom_line() + geom_point(aes(shape = Conditions), size = 3) + geom_hline(yintercept = 130) + scale_y_continuous( name = "Blood glucose (mg/dL)", breaks = seq(100, 230, by = 20) ) + scale_shape_manual( #name = "Conditions", labels = c("Missed meds", "Missed exercise"), values = c(20, 4), na.translate = FALSE ) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Reformat a dataframe
t to do is, instead of having 12 observations by row, I want to have one observation by row. I want to have a single column with 1509 observations instead of 126 rows with 12 columns per row. I tried the following: df = data.frame(matrix(nrow = Length, ncol = 1)) colnames(df) = c("aportes_alajuela") for (row in 1:nrow(alajuela_df)){ for (col in 1:ncol(alajuela_df)){ df[i,1]=alajuela_df[i,j] } } But I am not getting the data in the structure I want. Any help will be greatly appreciated. Best regards, Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here are two base R way with ?stack and with ?reshape. # 1. With stack() df_long <- stack(alajuela_df)[1] df_long <- df_long[complete.cases(df_long), , drop = FALSE] head(df_long) # 2. With reshape df_long <- reshape( alajuela_df, direction = "long", varying = names(alajuela_df), v.names = "x" )[2] # 1512 rows, only one column dim(df_long) # [1] 15121 # there are NA's in the data df_long[complete.cases(df_long), , drop = FALSE] |> dim() # [1] 15091 # keep the rows with values not NA df_long <- df_long[complete.cases(df_long), , drop = FALSE] # check the dimensions again dim(df_long) # [1] 15091 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot for 10 years extrapolation
Às 19:23 de 26/10/2023, varin sacha via R-help escreveu: Dear R-Experts, Here below my R code working but I don't know how to complete/finish my R code to get the final plot with the extrapolation for the10 more years. Indeed, I try to extrapolate my data with a linear fit over the next 10 years. So I create a date sequence for the next 10 years and store as a dataframe to make the prediction possible. Now, I am trying to get the plot with the actual data (from year 2004 to 2018) and with the 10 more years extrapolation. Thanks for your help. date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", "2004-12-31")) value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 10087, 14987, 11098, 13421, 9023, 12098) model <- lm(value~date) plot(value~date ,col="grey",pch=20,cex=1.5,main="Plot") abline(model,col="darkorange",lwd=2) dfuture <- data.frame(date=seq(as.Date("2019-12-31"), by="1 year", length.out=10)) predict(model,dfuture,interval="prediction") __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Here is a way with base R graphics. Explained in the code comments. date <-as.Date(c("2018-12-31", "2017-12-31", "2016-12-31", "2015-12-31", "2014-12-31", "2013-12-31", "2012-12-31", "2011-12-31", "2010-12-31", "2009-12-31", "2008-12-31", "2007-12-31", "2006-12-31", "2005-12-31", "2004-12-31")) value <-c(15348, 13136, 11733, 10737, 15674, 11098, 13721, 13209, 11099, 10087, 14987, 11098, 13421, 9023, 12098) model <- lm(value ~ date) dfuture <- data.frame(date = seq(as.Date("2019-12-31"), by="1 year", length.out=10)) predfuture <- predict(model, dfuture, interval="prediction") dfuture <- cbind(dfuture, predfuture) # start the plot with the required x and y limits xlim <- range(c(date, dfuture$date)) ylim <- range(c(value, dfuture$fit)) plot(value ~ date, col="grey", pch=20, cex=1.5, main="Plot" , xlim = xlim, ylim = ylim) # abline extends the fitted line past the x value (date) # limit making the next ten years line ugly and not even # completely overplotting the abline drawn line abline(model, col="darkorange", lwd=2) lines(fit ~ date, dfuture # , lty = "dashed" , lwd=2 , col = "black") # if lines() is used for both the interpolated and extrapolated # values you will have a gap between both fitted and predicted lines # but it is closer to what you want # get the fitted values first (interpolated values) ypred <- predict(model) plot(value ~ date, col="grey", pch=20, cex=1.5, main="Plot" , xlim = xlim, ylim = ylim) # plot the interpolated values lines(ypred ~ date, col="darkorange", lwd = 2) # and now the extrapolated values # I use normal orange to make the difference more obvious lines(fit ~ date, dfuture, lty = "dashed", lwd=2, col = "orange") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in print for data frames?
Hello, Inline. Às 13:32 de 26/10/2023, Ebert,Timothy Aaron escreveu: The "problem" goes away if you use x$C <- y[1,] Actually, if I understand correctly, the OP wants the column: x$C <- y[,1] In this case it will produce the same output because y is a df with only one row. But that is a very special case, the general case would be to extract the column. Hope this helps, Rui Barradas If you have another row in your x, say: x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6)) then your code x$C <- y[1] returns an error. If y has the same number of rows as x$C then R has the same outcome as in your example. It looks like your code tells R to replace all of column C (including the name) with all of vector y. Maybe unexpected, but not a bug. It is consistent. -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Thursday, October 26, 2023 6:43 AM To: Christian Asseburg ; r-help@r-project.org Subject: Re: [R] Bug in print for data frames? [External Email] Às 07:18 de 25/10/2023, Christian Asseburg escreveu: Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1: x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1) x A B C 1 1 2 3 x$B <- y$A # works as expected x A B C 1 1 1 3 x$C <- y[1] # makes C disappear x A B A 1 1 1 1 str(x) 'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer. Thanks. With best wishes - . . . Christian __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat/ .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu %7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84 %7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D&reserved=0 PLEASE do read the posting guide http://www.r/ -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C23 7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0% 7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FN CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D&reserved=0 and provide commented, minimal, self-contained, reproducible code. Hello, To expand on the good answers already given, I will present two other example data sets. Example 1. Imagine that instead of assigning just one column from y to x$C you assign two columns. The result is a data.frame column. See what is displayed as the columns names. And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[` doesn't work. You will have to extract the columns y$A and y$B one by one. x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1, B = 4) str(y) #> 'data.frame':1 obs. of 2 variables: #> $ A: num 1 #> $ B: num 4 x$C <- y[1:2] x #> A B C.A C.B #> 1 1 2 1 4 str(x) #> 'data.frame':1 obs. of 3 variables: #> $ A: num 1 #> $ B: num 2 #> $ C:'data.frame': 1 obs. of 2 variables: #> ..$ A: num 1 #> ..$ B: num 4 x[[1:2]] # doesn't work #> Error in .subset2(x, i, exact = exact): subscript out of bounds Example 2. Sometimes it is usefull to get a result like this first and then correct the resulting df. For instance, when computing more than one summary statistics. str(agg) below shows that the result summary stats is a matrix, so you have a column-matrix. And once again the displayed names reflect that. The trick to make the result a df is to extract all but the last column as a sub-df, extract the last column's values as a matrix (which it is) and then cbind the two together. cbind is a generic function. Since the first argument to cbind is a sub-df, the method called is cbind.data.frame and the result is a df. df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30) # the anonymous function computes more than one summary statistics # note that it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x))) agg #> AX.Mean X.S #> 1 a 14.50 9.082951 #> 2 b 15.50 9.082951 #> 3 c 16.50 9.082951 # similar effect as in the OP, The difference is that
Re: [R] Bug in print for data frames?
Às 07:18 de 25/10/2023, Christian Asseburg escreveu: Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think? Using R 4.3.1: x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1) x A B C 1 1 2 3 x$B <- y$A # works as expected x A B C 1 1 1 3 x$C <- y[1] # makes C disappear x A B A 1 1 1 1 str(x) 'data.frame': 1 obs. of 3 variables: $ A: num 1 $ B: num 1 $ C:'data.frame': 1 obs. of 1 variable: ..$ A: num 1 Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer. Thanks. With best wishes - . . . Christian __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, To expand on the good answers already given, I will present two other example data sets. Example 1. Imagine that instead of assigning just one column from y to x$C you assign two columns. The result is a data.frame column. See what is displayed as the columns names. And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[` doesn't work. You will have to extract the columns y$A and y$B one by one. x <- data.frame(A = 1, B = 2, C = 3) y <- data.frame(A = 1, B = 4) str(y) #> 'data.frame':1 obs. of 2 variables: #> $ A: num 1 #> $ B: num 4 x$C <- y[1:2] x #> A B C.A C.B #> 1 1 2 1 4 str(x) #> 'data.frame':1 obs. of 3 variables: #> $ A: num 1 #> $ B: num 2 #> $ C:'data.frame': 1 obs. of 2 variables: #> ..$ A: num 1 #> ..$ B: num 4 x[[1:2]] # doesn't work #> Error in .subset2(x, i, exact = exact): subscript out of bounds Example 2. Sometimes it is usefull to get a result like this first and then correct the resulting df. For instance, when computing more than one summary statistics. str(agg) below shows that the result summary stats is a matrix, so you have a column-matrix. And once again the displayed names reflect that. The trick to make the result a df is to extract all but the last column as a sub-df, extract the last column's values as a matrix (which it is) and then cbind the two together. cbind is a generic function. Since the first argument to cbind is a sub-df, the method called is cbind.data.frame and the result is a df. df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30) # the anonymous function computes more than one summary statistics # note that it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x))) agg #> AX.Mean X.S #> 1 a 14.50 9.082951 #> 2 b 15.50 9.082951 #> 3 c 16.50 9.082951 # similar effect as in the OP, The difference is that the last # column is a matrix, not a data.frame str(agg) #> 'data.frame':3 obs. of 2 variables: #> $ A: chr "a" "b" "c" #> $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:2] "Mean" "S" # nc is just a convenience, avoids repeated calls to ncol nc <- ncol(agg) cbind(agg[-nc], agg[[nc]]) #> A MeanS #> 1 a 14.5 9.082951 #> 2 b 15.5 9.082951 #> 3 c 16.5 9.082951 # all is well cbind(agg[-nc], agg[[nc]]) |> str() #> 'data.frame':3 obs. of 3 variables: #> $ A : chr "a" "b" "c" #> $ Mean: num 14.5 15.5 16.5 #> $ S : num 9.08 9.08 9.08 If the anonymous function hadn't returned a named vetor, the new column names would have been "1". "2", try it. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] by function does not separate output from function with mulliple parts
---- #> mydata$StepType: Second #> lm model parameter contrast #> #> Contrast S.E. LowerUpper t df Pr(>|t|) #> 1 -2.435 1.819421 -6.198759 1.328759 -1.34 23 0.1939 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running crossvalidation many times MSE for Lasso regression
>> >> MSE >> >> lst[i]<-MSE >> >> } >> >> mean(unlist(lst)) >> >> ## >> >> >> >> >> >> >> >> >> >> __ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > -- > Jin > -- > Jin Li, PhD > Founder, Data2action, Australia > https://www.researchgate.net/profile/Jin_Li32 > https://scholar.google.com/citations?user=Jeot53EJ&hl=en > [[alternative HTML version deleted]] > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, In your OP, the following two code lines are where that error comes from. predictLasso=predict(cv_model, newx=test1) ypred=predict(predictLasso,newdata=test1) predictLasso already are predictions, it's the output of predict. So when you run the 2nd line above you are passing it a matrix, not a fitted model, and the error is thrown. After the several suggestion in this thread, don't you want something like this instead of your for loop? # make the results reproducible set.seed(2023) # this is better than what you had z <- TT[c("x1", "x2")] |> as.matrix() y <- TT[["y"]] cv_model <- cv.glmnet(z, y, alpha = 1, type.measure = "mse") best_lambda <- cv_model$lambda.min best_lambda # these two values should be the same, and they are # index to minimum mse (i <- cv_model$index[1]) which(cv_model$lambda == cv_model$lambda.min) # these two values should be the same, and they are # value of minimum mse cv_model$cvm[i] min(cv_model$cvm) plot(cv_model) Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to test for numeric digits?
Às 19:35 de 18/10/2023, Leonard Mada escreveu: Dear Rui, On 10/18/2023 8:45 PM, Rui Barradas wrote: split_chem_elements <- function(x, rm.digits = TRUE) { regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" if(rm.digits) { stringr::str_replace_all(mol, regex, "#") |> strsplit("#|[[:digit:]]") |> lapply(\(x) x[nchar(x) > 0L]) } else { strsplit(x, regex, perl = TRUE) } } split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" s <- strsplit(x, regex, perl = TRUE) if(rm.digits) { s <- lapply(s, \(x) x[grep("[[:digit:]]+", x, invert = TRUE)]) } s } You have a glitch (mol is hardcoded) in the code of the first function. The times are similar, after correcting for that glitch. Note: - grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]", ...)! - corrected results below; Sincerely, Leonard ### split_chem_elements <- function(x, rm.digits = TRUE) { regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" if(rm.digits) { stringr::str_replace_all(x, regex, "#") |> strsplit("#|[[:digit:]]") |> lapply(\(x) x[nchar(x) > 0L]) } else { strsplit(x, regex, perl = TRUE) } } split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" s <- strsplit(x, regex, perl = TRUE) if(rm.digits) { s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)]) } s } mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl") mol1 <- rep(mol, 1) system.time( split_chem_elements(mol1) ) # user system elapsed # 0.58 0.00 0.58 system.time( split.symbol.character(mol1) ) # user system elapsed # 0.67 0.00 0.67 Hello, You are right, sorry for the blunder :(. In the code below I have replaced stringr::str_replace_all by the package stringi function stri_replace_all_regex and the improvement is significant. split_chem_elements <- function(x, rm.digits = TRUE) { regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" if(rm.digits) { stringi::stri_replace_all_regex(x, "#", regex) |> strsplit("#|[0-9]") |> lapply(\(x) x[nchar(x) > 0L]) } else { strsplit(x, regex, perl = TRUE) } } # system.time( # split_chem_elements(mol1) # ) # user system elapsed # 0.060.000.09 # system.time( # split.symbol.character(mol1) # ) # user system elapsed # 0.250.000.28 Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way to test for numeric digits?
Às 17:24 de 18/10/2023, Leonard Mada escreveu: Dear Rui, Thank you for your reply. I do have actually access to the chemical symbols: I have started to refactor and enhance the Rpdb package, see Rpdb::elements: https://github.com/discoleo/Rpdb However, the regex that you have constructed is quite heavy, as it needs to iterate through all chemical symbols (in decreasing nchar). Elements like C, and especially O, P or S, appear late in the regex expression - but are quite common in chemistry. The alternative regex is (in this respect) simpler. It actually works (once you know about the workaround). Q: My question focused if there is anything like is.numeric, but to parse each element of a vector. Sincerely, Leonard On 10/18/2023 6:53 PM, Rui Barradas wrote: Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu: Dear List members, What is the best way to test for numeric digits? suppressWarnings(as.double(c("Li", "Na", "K", "2", "Rb", "Ca", "3"))) # [1] NA NA NA 2 NA NA 3 The above requires the use of the suppressWarnings function. Are there any better ways? I was working to extract chemical elements from a formula, something like this: split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; # stringi::stri_split(x, regex = regex); s = strsplit(x, regex, perl = TRUE); if(rm.digits) { s = lapply(s, function(s) { isNotD = is.na(suppressWarnings(as.numeric(s))); s = s[isNotD]; }); } return(s); } split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")) Sincerely, Leonard Note: # works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) # broken in R 4.3.1 # only slightly "erroneous" with stringi::stri_split regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb PLEASE do read the posting guide https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK and provide commented, minimal, self-contained, reproducible code. Hello, If you want to extract chemical elements symbols, the following might work. It uses the periodic table in GitHub package chemr and a package stringr function. devtools::install_github("paleolimbot/chemr") split_chem_elements <- function(x) { data(pt, package = "chemr", envir = environment()) el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)] pat <- paste(el, collapse = "|") stringr::str_extract_all(x, pat) } mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl") split_chem_elements(mol) #> [[1]] #> [1] "C" "Cl" "F" #> #> [[2]] #> [1] "Li" "Al" "H" #> #> [[3]] #> [1] "C" "Cl" "C" "O" "Al" "P" "O" "Si" "O" "Cl" It is also possible to rewrite the function without calls to non base packages but that will take some more work. Hope this helps, Rui Barradas Hello, You and Avi are right, my function's performance is terrible. The following is much faster. As for how to not have digits throw warnings, the lapply in the version of your function below solves it by setting grep argument invert = TRUE. This will get all strings where digits do not occur. split_chem_elements <- function(x, rm.digits = TRUE) { regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" if(rm.digits) { stringr::str_replace_all(mol, regex, "#") |> strsplit("#|[[:digit:]]") |> lapply(\(x) x[nchar(x) > 0L]) } else { strsplit(x, regex, perl = TRUE) } } split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])" s <- strsplit(x, regex, perl = TRUE) if(rm.digits) { s &l
Re: [R] Best way to test for numeric digits?
Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu: Dear List members, What is the best way to test for numeric digits? suppressWarnings(as.double(c("Li", "Na", "K", "2", "Rb", "Ca", "3"))) # [1] NA NA NA 2 NA NA 3 The above requires the use of the suppressWarnings function. Are there any better ways? I was working to extract chemical elements from a formula, something like this: split.symbol.character = function(x, rm.digits = TRUE) { # Perl is partly broken in R 4.3, but this works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; # stringi::stri_split(x, regex = regex); s = strsplit(x, regex, perl = TRUE); if(rm.digits) { s = lapply(s, function(s) { isNotD = is.na(suppressWarnings(as.numeric(s))); s = s[isNotD]; }); } return(s); } split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")) Sincerely, Leonard Note: # works: regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) # broken in R 4.3.1 # only slightly "erroneous" with stringi::stri_split regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])"; strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, If you want to extract chemical elements symbols, the following might work. It uses the periodic table in GitHub package chemr and a package stringr function. devtools::install_github("paleolimbot/chemr") split_chem_elements <- function(x) { data(pt, package = "chemr", envir = environment()) el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)] pat <- paste(el, collapse = "|") stringr::str_extract_all(x, pat) } mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl") split_chem_elements(mol) #> [[1]] #> [1] "C" "Cl" "F" #> #> [[2]] #> [1] "Li" "Al" "H" #> #> [[3]] #> [1] "C" "Cl" "C" "O" "Al" "P" "O" "Si" "O" "Cl" It is also possible to rewrite the function without calls to non base packages but that will take some more work. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a time series
Às 11:12 de 16/10/2023, ahmet varlı escreveu: Hello everyone, � had 15 minutes of data from 2017-11-02 13:30:00 to 2022-11-26 23:45:00 and number of data is 177647 � would like to ask why my time series are less then my expectation. baslangic <- as.POSIXct("2017-11-02 13:30:00", tz = "CET") bitis <- as.POSIXct("2022-11-26 23:45:00", tz = "CET") # zaman_seti <- seq.POSIXt(from = baslangic, to = bitis, by = 60 * 15) length(zaman_seti) [1] 177642 but it has to be 177647 and secondly � have times in this format ( 2.11.2017 13:30/DD-MM- HH:MM:SS) su_seviyeleri_data <- as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%Y-%m-%d %H:%M:%S") I am using this code to change the format but it gives result as Na How can � solve this problem? Bests, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Given your date format, try format = "%d.%m.%Y %H:%M" Test with your date time: x <- "2.11.2017 13:30" as.POSIXct(x, format = "%d.%m.%Y %H:%M") #> [1] "2017-11-02 13:30:00 WET" as.POSIXct(su_seviyeleri_data$kayit_zaman, format = "%d.%m.%Y %H:%M") Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if-else that returns vector
Às 21:22 de 12/10/2023, Christofer Bogaso escreveu: Hi, Following expression returns only the first element ifelse(T, c(1,2,3), c(5,6)) However I am looking for some one-liner expression like above which will return the entire vector. Is there any way to achieve this? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I don't like it but ifelse(rep(T, length(c(1,2,3))), c(1,2,3), c(5,6)) maybe you should use max(length(c(1, 2, 3)), length(5, 6))) instead, but it's still ugly. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Text showing when R is launched
Às 19:21 de 11/10/2023, George Loftus escreveu: Hi, Thankyou for your response <https://1drv.ms/i/s!AkfoLX--ikbqkweYckSQiXYKXJuR> [https://9c11xq.db.files.1drv.com/y4m7xqt5yVu7b5IG1jFuopunwB7Oa9Eij0WeZ7p1lSSmBECcSIB3XjcKjXIUhdMrJwaJdjZnBRhMeAxY0_Kko06Nq1fm5IhqaHlT6aFeI3R7gicXCteRPkzqNwmCdVxZu5DhNq66IrpwDyQ1lr8E5OFdm_xL86pMgNSLAx5HRRKLPOmFdUFWdv1ID-D1PC6LvNvAB-rT87JiQonSHRJIHouLg?width=200&height=150&cropmode=center] [https://res-h3.public.cdn.office.net/assets/mail/file-icon/png/cloud_blue_16x16.png]Screenshot 2023-10-11 at 19.19.48.png ? However this is all that exists in Users/Admin There were a couple of R files in there which I have since deleted but I am still getting the same issue Thankyou, George ________ From: Rui Barradas Sent: 10 October 2023 12:06 To: George Loftus ; r-help@r-project.org Subject: Re: [R] Text showing when R is launched Às 23:56 de 09/10/2023, George Loftus escreveu: Good Evening, I was wondering if you were able to help, I am running R on MacOS, it is the 2020 model mac so have install the Intel arm of R which I believe is correct However when I launch R or resume the R window after going on a different programme the following text is running I have also copied and pasted for ease 1 HIToolbox 0x7ff82142e0c2 _ZN15MenuBarInstance22RemoveAutoShowObserverEv + 30 2 HIToolbox 0x7ff82146a638 _ZL17BroadcastInternaljPvh + 167 3 SkyLight0x7ff81c70f23d _ZN12_GLOBAL__N_123notify_datagram_handlerEj15CGSDatagramTypePvmS1_ + 1030 4 SkyLight0x7ff81ca2205a _ZN21CGSDatagramReadStream26dispatchMainQueueDatagramsEv + 202 5 SkyLight0x7ff81ca21f81 ___ZN21CGSDatagramReadStream15mainQueueWakeupEv_block_invoke + 18 6 libdispatch.dylib 0x7ff8178867fb _dispatch_call_block_and_release + 12 7 libdispatch.dylib 0x7ff817887a44 _dispatch_client_callout + 8 8 libdispatch.dylib 0x7ff8178947b9 _dispatch_main_queue_drain + 952 9 libdispatch.dylib 0x7ff8178943f3 _dispatch_main_queue_callback_4CF + 31 10 CoreFoundation 0x7ff817b215f0 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9 11 CoreFoundation 0x7ff817ae1b70 __CFRunLoopRun + 2454 12 CoreFoundation 0x7ff817ae0b60 CFRunLoopRunSpecific + 560 13 HIToolbox 0x7ff82142e766 RunCurrentEventLoopInMode + 292 14 HIToolbox 0x7ff82142e576 ReceiveNextEventCommon + 679 15 HIToolbox 0x7ff82142e2b3 _BlockUntilNextEventMatchingListInModeWithFilter + 70 16 AppKit 0x7ff81ac31293 _DPSNextEvent + 909 17 AppKit 0x7ff81ac30114 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 1219 18 R 0x000103d60c76 -[RController doProcessEvents:] + 166 19 R 0x000103d5b295 -[RController handleReadConsole:] + 149 20 R 0x000103d6466f Re_ReadConsole + 175 21 libR.dylib 0x000104442154 R_ReplDLLdo1 + 148 22 R 0x000103d71c47 run_REngineRmainloop + 263 23 R 0x000103d66d5f -[REngine runREPL] + 143 24 R 0x000103d56718 main + 792 25 dyld0x7ff8176d4310 start + 2432 1 HIToolbox 0x7ff8214a1726 _ZN15MenuBarInstance22EnsureAutoShowObserverEv + 102 2 HIToolbox 0x7ff82146a638 _ZL17BroadcastInternaljPvh + 167 3 SkyLight0x7ff81c70f23d _ZN12_GLOBAL__N_123notify_datagram_handlerEj15CGSDatagramTypePvmS1_ + 1030 4 SkyLight0x7ff81ca2205a _ZN21CGSDatagramReadStream26dispatchMainQueueDatagramsEv + 202 5 SkyLight0x7ff81ca21f81 ___ZN21CGSDatagramReadStream15mainQueueWakeupEv_block_invoke + 18 6 libdispatch.dylib 0x7ff8178867fb _dispatch_call_block_and_release + 12 7 libdispatch.dylib 0x7ff817887a44 _dispatch_client_callout + 8 8 libdispatch.dylib 0x7ff8178947b9 _dispatch_main_queue_drain + 952 9 libdispatch.dylib 0x7ff8178943f3 _dispatch_main_queue_callback_4CF + 31 10 CoreFoundation 0x7ff817b215f0 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9 11 CoreFoundation 0x7ff817ae1b70 __CFRunLoopRun + 2454 12 CoreFoundation 0x7ff817ae0b60 CFRunLoopRunSpecific + 560
Re: [R] Text showing when R is launched
0x000103d71c47 run_REngineRmainloop + 263 23 R 0x000103d66d5f -[REngine runREPL] + 143 24 R 0x000103d56718 main + 792 25 dyld0x7ff8176d4310 start + 2432 Are you able to inform me what is causing this? I can't seem to find any online help regarding this Thankyou in advance, George Loftus __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Try deleting file /Users/admin/.RData It is restoring the previous session and this is many times a source for problems. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?
Às 10:09 de 06/10/2023, Chris Evans via R-help escreveu: The reason I am asking is that I would like to mark areas on a plot using geom_polygon() and aes(fill = variable) to fill various polygons forming the background of a plot with different colours. Then I would like to overlay that with points representing direction of change: improved, no reliable change, deteriorated. The obvious symbols to use for those three directions are an upward arrow, a circle or square and a downward pointing arrow. There is a solid upward point triangle symbol in R (ph = 17) and there are both upward and downward pointing open triangle symbols (pch 21 and 25) but to fill those with a solid colour so they will be visible over the background requires that I use a fill aesthetic and that gets me a mess with the legend as I will have used a different fill mapping to fill the polygons. This silly reprex shows the issue I think. library(tidyverse) tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> tmpTibPoints tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> tmpTibArea1 tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> tmpTibArea2 tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> tmpTibArea3 tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> tmpTibArea4 bind_rows(tmpTibArea1, tmpTibArea2, tmpTibArea3, tmpTibArea4) -> tmpTibAreas ggplot(data = tmpTib, aes(x = x, y = y)) + geom_polygon(data = tmpTibAreas, aes(x = x, y = y, fill = a)) + geom_point(data = tmpTibPoints, aes(x = x, y = y, fill = c), pch = 24, size = 6) Does anyone know a way to create a solid downward pointing symbol? Or another workaround? TIA, Chris Hello, Maybe you can solve the problem with unicode characters. See the two scale_*_manual at the end of the plot. # Unicode characters for black up- and down-pointing characters pts_shapes <- c("\U25B2", "\U25BC") |> setNames(c("A", "B")) pts_colors <- c("blue", "red") |> setNames(c("A", "B")) ggplot(data = tmpTibAreas, aes(x = x, y = y)) + geom_polygon(data = tmpTibAreas, aes(x = x, y = y, fill = a)) + geom_point(data = tmpTibPoints, aes(x = x, y = y, color = c, shape = c), size = 6) + scale_shape_manual(values = pts_shapes) + scale_color_manual(values = pts_colors) -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R issue / No buffer space available
Às 21:28 de 04/10/2023, Ohad Oren, MD escreveu: Hello, I keep getting the following message about 'no buffer space available'. I am using R studio via connection to server. I verified that the connection to the server is good. 2023-10-04T20:26:25.698193Z [rsession-oo968] ERROR system error 105 (No buffer space available) [host: localhost, uri: /log_message, path: /var/run/rstudio-server/rstudio-rserver/rserver-monitor.socket]; OCCURRED AT void rstudio::core::http::LocalStreamAsyncClient::handleConnect(const rstudio_boost::system::error_code&) src/cpp/session/SessionModuleContext.cpp:124 Will appreciate your help! Ohad [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, RStudio is an IDE for R, not R itself. That is a RStudio error and RStudio technical support [1] is better suited to solve your problem. [1] https://community.rstudio.com/ Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] annotate
Às 20:34 de 04/10/2023, Subia Thomas OI-US-LIV5 escreveu: Colleagues, I wish to create y-data labels which meet a criterion. Here is my reproducible code. library(dplyr) library(ggplot2) library(cowplot) above_92 <- filter(faithful,waiting>92) ggplot(faithful,aes(x=eruptions,y=waiting))+ geom_point(shape=21,size=3,fill="orange")+ theme_cowplot()+ geom_hline(yintercept = 92)+ annotate(geom="text",x=above_92$eruptions,y=above_92$waiting+2,label=above_92$waiting) A bit of trial and error is required to figure out what number to add or subtract to above_92$waiting. Is there a more efficient way to do this? Thomas Subia Lean Six Sigma Senior Practitioner DRÄXLMAIER Group DAA Draexlmaier Automotive of America LLC mailto:thomas.su...@draexlmaier.com http://www.draexlmaier.com "Nous croyons en Dieu. Tous les autres doivent apporter des données. Edward Deming Public: All rights reserved. Distribution to third parties allowed. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hdello, Yes, there is an automatic way of doing this. Use a new data set in geom_text or annotate. Below I use geom_text. Then vjust will take care of the labels placement. library(dplyr) library(ggplot2) library(cowplot) above_92 <- filter(faithful, waiting > 92) ggplot(faithful, aes(x = eruptions, y = waiting)) + geom_point(shape=21,size=3,fill="orange") + geom_hline(yintercept = 92) + # use a new data argument here geom_text( data = above_92, mapping = aes(x = eruptions, y = waiting, label = waiting), vjust = -1 ) + theme_cowplot() Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Jim Lemon RIP
My sympathies for your loss. Jim Lemon was a dedicated contributor to the R community and his answers were always welcome. Jim will be missed. Rui Barradas Às 23:36 de 04/10/2023, Jim Lemon escreveu: Hello, I am very sad to let you know that my husband Jim died on 18th September. I apologise for not letting you know earlier but I had trouble finding the password for his phone. Kind regards, Juel -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping by Date and showing count of failures by date
Às 21:29 de 29/09/2023, Paul Bernal escreveu: Dear friends, Hope you are doing great. I am attaching the dataset I am working with because, when I tried to dput() it, I was not able to copy the entire result from dput(), so I apologize in advance for that. I am interested in creating a column named Failure_Date_Period that has the FAILDATE but formatted as _MM. Then I want to count the number of failures (given by column WONUM) and just have a dataframe that has the FAILDATE and the count of WONUM. I tried this: pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt <- PivotTable$new() pt$addData(failuredf) pt$addColumnDataGroups("FAILDATE") pt$defineCalculation(calculationName = "FailCounts", summariseExpression="n()") pt$renderPivot() but I was not successful. Bottom line, I need to create a new dataframe that has the number of failures by FAILDATE, but in -MM format. Any help and/or guidance will be greatly appreciated. Kind regards, Paul __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, No data is attached. Maybe try dput(head(failuredf, 30)) ? And where can we find non-base PivotTable? Please start the scripts with calls to library() when using non-base functionality. Hope this helps, Rui Barradas -- Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus. www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict function type class vs. prob
Às 11:12 de 22/09/2023, Milbert, Sabine (LGL) escreveu: Dear R Help Team, My research group and I use R scripts for our multivariate data screening routines. During routine use, we encountered some inconsistencies within the predict() function of the R Stats Package. Through internal research, we were unable to find the reason for this and have decided to contact your help team with the following issue: The predict() function is used once to predict the class membership of a new sample (type = "class") on a trained linear SVM model for distinguishing two classes (using the caret package). It is then used to also examine the probability of class membership (type = "prob"). Both are then presented in an R shiny output. Within the routine, we noticed two samples (out of 100+) where the class prediction and probability prediction did not match. The prediction probabilities of one class (52%) did not match the class membership within the predict function. We use the same seed and the discrepancy is reproducible in this sample. The same problem did not occur in other trained models (lda, random forest, radial SVM...). Is there a weighing of classes within the prediction function or is the classification limit not at 50%/a majority vote? Or do you have another explanation for this discrepancy, please let us know. PS: If this is an issue based on the model training function of the caret package and therefore not your responsibility, please let us know. Thank you in advance for your support! Yours sincerely, Sabine Milbert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, I cannot tell what is going on but I would like to make a correction to your post. predict() is a generic function with methods for objects of several classes in many packages. In base package stats you will find methods for objects (fits) of class lm, glm and others, see ?predict. The method you are asking about is predict.train, defined in package caret, not in package stats. to see what predict method is being called, check class(your_fit) Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hadamard transformation
Às 18:45 de 18/09/2023, mohan radhakrishnan escreveu: Hello, I am attempting to port the R code which is an answer to https://codegolf.stackexchange.com/questions/194229/implement-the-2d-hadamard-transform function(M){for(i in 1:log2(nrow(M)))T=T%x%matrix(1-2*!3:0,2)/2; print(T); T%*%M%*%T} The code, 3 inputs and the corresponding outputs are shown in https://tio.run/##PYyxCsIwFEX3fkUcAu@VV7WvcSl2dOwi8QNqNSXQJhAqrYjfHoOIwz3D4XBDNOJYiGgerp@td9Diy/gAVlgnynr0A4MLfkkeUTdarnLq5mBXKAvON1W9J8YdZ1rmsk3T72jgV/TAVBHTAROYrs/00@jz5YSY/aOSFKmvGP1yD9sk4Wa7ARSSRowf These are the inputs. f(matrix(c(2,3,2,5),2,2,byrow=TRUE)) f(matrix(1,4,4)) f(lower.tri(diag(4),T)) My attempt to port this R code to another framework(Tensorflow) was only partially successful because I didn't fully understand the cryptic R code. The second input shown above works after hacking Tensorflow for a long time. My question is this. Can anyone code this in a clear way so that I can understand ? I understand Kronecker Product and matrix multiplication and can port that code but I am missing something as the same ported code does not work for all inputs. Thanks, Mohan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Is this what you want? (I have changed the notation a bit.) H <- function(M){ H0 <- 1 Transf <- matrix(c(1, 1, 1, -1), 2L) for(i in 1:log2(nrow(M))) { H0 <- H0 %x% Transf/2 } H0 %*% M %*% H0 } x <- matrix(c(2, 3, 2, 5), 2, 2, byrow = TRUE) y <- matrix(1, 4, 4) z <- lower.tri(diag(4), TRUE) z[] <- apply(z, 2, as.integer) H(x) H(y) H(z) Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with plotting and date-times for climate data
Às 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu: Hello, I'm trying to calculate the mean temperature max from a file of climate date, and plot it over a range of days in the year. I've downloaded the data, and cleaned it up the way I think it should be. However, when I plot it, the geom_smooth line doesn't show up. I think that's because my x axis is characters or factors. Here's what I have so far: library(tidyverse) data <- read_csv("Ely_MN_Weather.csv") start_day = yday(as_date("2023-09-22")) end_day = yday(as_date("2023-10-15")) d <- as_tibble(data) %>% select(DATE,TMAX,TMIN) %>% mutate(DATE = as_date(DATE), yday = yday(DATE), md = sprintf("%02d-%02d", month(DATE), mday(DATE)) ) %>% filter(yday >= start_day & yday <= end_day) %>% mutate(md = as.factor(md)) d_sum <- d %>% group_by(md) %>% summarize(tmax_mean = mean(TMAX, na.rm=TRUE)) ## Here's the filtered data: dput(d_sum) structure(list(md = structure(1:25, levels = c("09-21", "09-22", "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29", "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06", "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13", "10-14", "10-15"), class = "factor"), tmax_mean = c(65, 62.2, 61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9, 61.2, 63.7, 59.5, 59.6, 61.6, 59.4, 58.8, 55.9, 58.125, 58, 55.7, 57, 55.4, 49.8, 48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(NA, -25L)) ggplot(data = d_sum, aes(x = md)) + geom_point(aes(y = tmax_mean, color = "blue")) + geom_smooth(aes(y = tmax_mean, color = "blue")) = My questions are: 1. Why isn't my geom_smooth plotting? How can I fix it? 2. I don't think I'm handling the month and day combination correctly. Is there a way to encode month and day (but not year) as a date? 3. (Minor point) Why does my graph of tmax_mean come out red when I specify "blue"? Thanks for any advice or guidance you can offer. I really appreciate the expertise of this group. -Kevin __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, The problem is that the dates are factors, not real dates. And geom_smooth is not interpolating along a discrete axis (the x axis). Paste a fake year with md, coerce to date and plot. I have simplified the aes() calls and added a date scale in order to make the x axis more readable. Without the formula and method arguments, geom_smooth will print a message, they are now made explicit. suppressPackageStartupMessages({ library(dplyr) library(ggplot2) }) d_sum %>% mutate(md = paste("2023", md, sep = "-"), md = as.Date(md)) %>% ggplot(aes(x = md, y = tmax_mean)) + geom_point(color = "blue") + geom_smooth( formula = y ~ x, method = loess, color = "blue" ) + scale_x_date(date_breaks = "7 days", date_labels = "%m-%d") Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graph in R with grouping letters from the turkey test with agricolae package
Às 16:24 de 12/09/2023, Loop Vinyl escreveu: I would like to produce the attached graph (graph1) with the R package agricolae, could someone give me an example with the attached data (data)? I expect an adapted graph (graph2) with the data (data) Best regards __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, There are no attached graphs, only data. Can you post the code have you tried? Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.trend.test
Às 10:06 de 08/09/2023, peter dalgaard escreveu: Yes, this was written a bit bone-headed (as I am allowed to say...) If you look at the code, you will see inside: a <- anova(lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)), weights = w)) and the lm() inside should give you the direction via the sign of the regression coefficient on "score". So, at least for now, you could just doctor a copy of the code for your own purposes, as in fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)), weights = w) a <- anova(fit) and arrange to return coef(fit)["score"] at the end. Something like structure(... estimate=c(lpm.slope=coef(fit)["score"]) ) (I expect that you might also extract the t-statistic from coef(summary(fit)) and find that it is the signed square root of the Chi-square, but I won't have time to test that just now.) -pd On 8 Sep 2023, at 07:22 , Thomas Subia via R-help wrote: Colleagues, Thanks all for the responses. I am monitoring the daily total number of defects per sample unit. I need to know whether this daily defect proportion is trending upward (a bad thing for a manufacturing process). My first thought was to use either a u or a u' control chart for this. As far as I know, u or u' charts are poor to detect drifts. This is why I chose to use prop.trend.test to detect trends in proportions. While prop.trend.test can confirm the existence of a trend, as far as I know, it is left to the user to determine what direction that trend is. One way to illustrate trending is of course to plot the data and use geom_smooth and method lm For the non-statisticians in my group, I've found that using this method along with the p-value of prop.trend.test, makes it easier for the users to determine the existence of trending and its direction. If there are any other ways to do this, please let me know. Thomas Subia On Thursday, September 7, 2023 at 10:31:27 AM PDT, Rui Barradas wrote: Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu: Colleagues Consider smokers <- c( 83, 90, 129, 70 ) patients <- c( 86, 93, 136, 82 ) prop.trend.test(smokers, patients) Output: Chi-squared Test for Trend inProportions data: smokers out of patients , using scores: 1 2 3 4 X-squared = 8.2249, df = 1, p-value = 0.004132 # trend test for proportions indicates proportions aretrending. How does one identify the direction of trending? # prop.test indicates that the proportions are unequal but doeslittle to indicate trend direction. All the best, Thomas Subia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, By visual inspection it seems that there is a decreasing trend. Note that the sample estimates of prop.test and smokers/patients are equal. smokers <- c( 83, 90, 129, 70 ) patients <- c( 86, 93, 136, 82 ) prop.test(smokers, patients)$estimate #>prop 1prop 2prop 3prop 4 #> 0.9651163 0.9677419 0.9485294 0.8536585 smokers/patients #> [1] 0.9651163 0.9677419 0.9485294 0.8536585 plot(smokers/patients, type = "b") Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, Actually, the t-statistic is not the signed square root of the X-squared test statistic. I have edited the function, assigned the lm fit and returned it as is. (print.htest won't print this new list member so the output is not cluttered with irrelevant noise.) smokers <- c( 83, 90, 129, 70 ) patients <- c( 86, 93, 136, 82 ) edit(prop.trend.test, file = "ptt.R") source("ptt.R") # stats::prop.trend.test edited to include the results # of the lm fit and saved under a new name ptt <- function (x, n, score = seq_along(x)) { method <- "Chi-squared Test for Trend in Proportions" dname <- paste(deparse1(substitute(x)), "out of", deparse1(substitute(n)), ",\n using scores:", paste(score, collapse = " ")) x <- as.vector(x) n <- as.vector(n) p <- sum(x)/sum(n) w <- n/p/(1 - p) a <- anova(fit <- lm(freq ~ score, data = list(freq = x/n, score = as.vector(score)), weights = w)) chisq <- c(`X-squared` = a["score", "Sum Sq"]) s
Re: [R] prop.trend.test
Às 14:23 de 07/09/2023, Thomas Subia via R-help escreveu: Colleagues Consider smokers <- c( 83, 90, 129, 70 ) patients <- c( 86, 93, 136, 82 ) prop.trend.test(smokers, patients) Output: Chi-squared Test for Trend inProportions data: smokers out of patients , using scores: 1 2 3 4 X-squared = 8.2249, df = 1, p-value = 0.004132 # trend test for proportions indicates proportions aretrending. How does one identify the direction of trending? # prop.test indicates that the proportions are unequal but doeslittle to indicate trend direction. All the best, Thomas Subia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, By visual inspection it seems that there is a decreasing trend. Note that the sample estimates of prop.test and smokers/patients are equal. smokers <- c( 83, 90, 129, 70 ) patients <- c( 86, 93, 136, 82 ) prop.test(smokers, patients)$estimate #>prop 1prop 2prop 3prop 4 #> 0.9651163 0.9677419 0.9485294 0.8536585 smokers/patients #> [1] 0.9651163 0.9677419 0.9485294 0.8536585 plot(smokers/patients, type = "b") Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regarding error in RStudio
Às 17:59 de 05/09/2023, Sukriti Sood escreveu: Hi, I am Sukriti Sood, a research analyst at Woodstock Institute <https://woodstockinst.org/> . I use RStudio extensively for our analysis. I have been facing two issues for a while: 1. I am unable to copy from RStudio and paste into or vice versa to any other programs. 2. I am facing some kind of a conversion error (screenshot attached). I tried looking up online however could not find a resolution to these issues. Could I please get some help with this urgently. Thanks! Best, Sukriti Sood Sukriti Sood | Research Analyst Woodstock Institute Pronouns: She/Her/Hers 67 East Madison, Suite 2108 | Chicago, Illinois 60603 O (312) 368-0310 x2029 | C (610) 604-6708 www.woodstockinst.org<http://www.woodstockinst.org/> | ss...@woodstockinst.org<mailto:ss...@woodstockinst.org> __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hello, You should post RStudio questions to the RStudio support service, they answer quickly and the answers are generally good. It's written at the bottom of the attached image that the workspace was loaded from file C:/WSI/.RData Close RStudio, remove this file and restart. See if it solved it. Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge and replace data
Às 09:55 de 05/09/2023, roslinazairimah zakaria escreveu: Hi all, I have these data x1 <- c(116,0,115,137,127,0,0) x2 <- c(0,159,0,0,0,159,127) I want : xx <- c(116,115,137,127,159, 127) I would like to merge these data into one column. Whenever the data is '0' it will be replaced by the value in the column which is non zero.. I tried append and merge but fail to get what I want. Hello, That's a case for ?pmax: x1 <- c(116,0,115,137,127,0,0) x2 <- c(0,159,0,0,0,159,127) pmax(x1, x2) #> [1] 116 159 115 137 127 159 127 Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate formula - differing results
Às 12:51 de 04/09/2023, Ivan Calandra escreveu: Thanks Rui for your help; that would be one possibility indeed. But am I the only one who finds that behavior of aggregate() completely unexpected and confusing? Especially considering that dplyr::summarise() and doBy::summaryBy() deal with NAs differently, even though they all use mean(na.rm = TRUE) to calculate the group stats. Best wishes, Ivan On 04/09/2023 13:46, Rui Barradas wrote: Às 10:44 de 04/09/2023, Ivan Calandra escreveu: Dear useRs, I have just stumbled across a behavior in aggregate() that I cannot explain. Any help would be appreciated! Sample data: my_data <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", "FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 35.297), PLATWIDTH = c(38.84, NA, 15.33, 30.37, 11.44, 14.88, 13.86, NA, NA, 26.71), PLATTHICK = c(8.67, NA, 7.99, 11.69, 3.3, 16.52, 4.58, NA, NA, 9.35), EPA = c(78, NA, 78, 54, 72, 49, 56, NA, NA, 56), THICKNESS = c(10.97, NA, 9.36, 6.4, 5.89, 11.05, 4.9, NA, NA, 10.08), WEIGHT = c(34.3, NA, 25.5, 18.6, 14.9, 29.5, 4.5, NA, NA, 23), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = "data.frame") 1) Simple aggregation with 2 variables: aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data, FUN = mean, na.rm = TRUE) 2) Using the dot notation - different results: aggregate(. ~ RAWMAT, data = my_data[-1], FUN = mean, na.rm = TRUE) 3) Using dplyr, I get the same results as #1: group_by(my_data, RAWMAT) %>% summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE))) 4) It gets weirder: using all columns in #1 give the same results as in #2 but different from #1 and #3 aggregate(cbind(EdgeLength, SurfaceArea, Length, Width, PLATWIDTH, PLATTHICK, EPA, THICKNESS, WEIGHT) ~ RAWMAT, data = my_data, FUN = mean, na.rm = TRUE) So it seems it is not only due to the notation (cbind() vs. dot). Is it a bug? A peculiar thing in my dataset? I tend to think this could be due to some variables (or their names) as all notations seem to agree when I remove some variables (although I haven't found out which variable(s) is (are) at fault), e.g.: my_data2 <- structure(list(ID = c("FLINT-1", "FLINT-10", "FLINT-100", "FLINT-101", "FLINT-102", "HORN-10", "HORN-100", "HORN-102", "HORN-103", "HORN-104"), EdgeLength = c(130.75, 168.77, 142.79, 130.1, 140.41, 121.37, 70.52, 122.3, 71.01, 104.5), SurfaceArea = c(1736.87, 1571.83, 1656.46, 1247.18, 1177.47, 1169.26, 444.61, 1791.48, 461.15, 1127.2), Length = c(44.384, 29.831, 43.869, 48.011, 54.109, 41.742, 23.854, 32.075, 21.337, 35.459), Width = c(45.982, 67.303, 52.679, 26.42, 25.149, 33.427, 20.683, 62.783, 26.417, 35.297), RAWMAT = c("FLINT", "FLINT", "FLINT", "FLINT", "FLINT", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS", "HORNFELS")), row.names = c(1L, 2L, 3L, 4L, 5L, 111L, 112L, 113L, 114L, 115L), class = "data.frame") aggregate(cbind(EdgeLength, SurfaceArea, Length, Width) ~ RAWMAT, data = my_data2, FUN = mean, na.rm = TRUE) aggregate(. ~ RAWMAT, data = my_data2[-1], FUN = mean, na.rm = TRUE) group_by(my_data2, RAWMAT) %>% summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) Thank you in advance for any hint. Best wishes, Ivan *LEIBNIZ-ZENTRUM* *FÜR ARCHÄOLOGIE* *Dr. Ivan CALANDRA* **Head of IMPALA (IMaging Platform At LeizA) *MONREPOS* Archaeological Research Centre, Schloss Monrepos 56567 Neuwied, Germany T: +49 2631 9772 243 T: +49 6131 8885 543 ivan.calan...@leiza.de leiza.de <http://www.leiza.de/> <http://www.leiza.de/> ORCID <https://orcid.org/-0003-3816-6359> ResearchGate <https://www.researchgate.net/profile/Ivan_Calandra> LEIZA is a foundation under public law of the State of Rhineland-Palatinate and the City of Mainz. Its headquarters are in Mainz. Supervision is carried out by the Ministry of Science and Health of the State of Rhineland-Palatinate. LEIZA is a research museum of the Leibniz Association. _
Re: [R] aggregate formula - differing results
A vals in at least one column and the results are the same. However, this will not give the mean values of the other numeric columns, just of those two. # define a vector of columns of interest cols <- c("Length", "Width", "RAWMAT") # 1) Simple aggregation with 2 variables, select cols: aggregate(cbind(Length, Width) ~ RAWMAT, data = my_data[cols], FUN = mean, na.rm = TRUE) # 2) Using the dot notation - if cols are selected, equal results: aggregate(. ~ RAWMAT, data = my_data[cols], FUN = mean, na.rm = TRUE) # 3) Using dplyr, the results are now the same results as #1 and #2: my_data %>% select(all_of(cols)) %>% group_by(RAWMAT) %>% summarise(across(c("Length", "Width"), ~ mean(.x, na.rm = TRUE))) Hope this helps, Rui Barradas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.