[R] help with LDA topic modelling..
dear members, I am using LDA for topic modelling of news articles (topicmodels package). I am visualizing the accuracy with the LDAvis package. The visualization shows clusters as circles, probably intersecting. My question is, if a find the optimal number of topics, k, and if the circles representing the topics doesn't intersect, then I have achieved perfect segregation. AM I right? Thanking You, Yours sincerely, AKSHAY M KULKARNI [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum every n (4) observations by group
Milu, Your data seems to be very consistent in that each value of ID has eight rows. You seem to want to just sum every four so that fits: ID Date Value 1 A 4140 0.000207232 2 A 4141 0.000240141 3 A 4142 0.000271414 4 A 4143 0.000258384 5 A 4144 0.000243640 6 A 4145 0.000271480 7 A 4146 0.000280585 8 A 4147 0.000289691 9 B 4140 0.000298797 10 B 4141 0.000307903 11 B 4142 0.000317008 12 B 4143 0.000326114 13 B 4144 0.000335220 14 B 4145 0.000344326 15 B 4146 0.000353431 16 B 4147 0.000362537 17 C 4140 0.000371643 18 C 4141 0.000380749 19 C 4142 0.000389854 20 C 4143 0.000398960 21 C 4144 0.000408066 22 C 4145 0.000417172 23 C 4146 0.000426277 24 C 4147 0.000435383 There are many ways to do what you want, some more general than others, but one trivial way is to add a column that contains 24 numbers ranging from 1 to 6 like this assuming mydf holds the above: Here is an example of such a vector: rep(1:(nrow(mydf)/4), each=4) [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 So you can add a column like: > mydf$fours <- rep(1:(nrow(mydf)/4), each=4) > mydf ID Date Value fours 1 A 4140 0.000207232 1 2 A 4141 0.000240141 1 3 A 4142 0.000271414 1 4 A 4143 0.000258384 1 5 A 4144 0.000243640 2 6 A 4145 0.000271480 2 7 A 4146 0.000280585 2 8 A 4147 0.000289691 2 9 B 4140 0.000298797 3 10 B 4141 0.000307903 3 11 B 4142 0.000317008 3 12 B 4143 0.000326114 3 13 B 4144 0.000335220 4 14 B 4145 0.000344326 4 15 B 4146 0.000353431 4 16 B 4147 0.000362537 4 17 C 4140 0.000371643 5 18 C 4141 0.000380749 5 19 C 4142 0.000389854 5 20 C 4143 0.000398960 5 21 C 4144 0.000408066 6 22 C 4145 0.000417172 6 23 C 4146 0.000426277 6 24 C 4147 0.000435383 6 You now use grouping any way you want to apply a function and in this case you want a sum. I like to use the tidyverse functions so will show that as in: mydf %>% group_by(ID, fours) %>% summarize(sums=sum(Value), n=n()) I threw in the extra column in case your data sometimes does not have 4 at the end of a group or beginning of next. Here is the output: # A tibble: 6 x 4 # Groups: ID [3] IDfours sums n 1 A 1 0.000977 4 2 A 2 0.00109 4 3 B 3 0.00125 4 4 B 4 0.00140 4 5 C 5 0.00154 4 6 C 6 0.00169 4 Of course there are all kinds of ways to do this in standard R, including trivial ones like looping over indices starting at 1 and taking four at a time and getting the Value data for mydf$Value[N] + mydf$Value[N+1] ... -Original Message- From: R-help On Behalf Of Miluji Sb Sent: Sunday, December 19, 2021 1:32 PM To: r-help mailing list Subject: [R] Sum every n (4) observations by group Dear all, I have a dataset (below) by ID and time sequence. I would like to sum every four observations by ID. I am confused how to combine the two conditions. Any help will be highly appreciated. Thank you! Best. Milu ## Dataset structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 )), class = "data.frame", row.names = c(NA, -24L)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum every n (4) observations by group
Dear Peter, Thanks so much for your reply and the code! This is helpful. What I would like is the data.frame below - sum values for *4140, 4141, 4142, 4143 *and then for *4144, 4145, 4146, 4147 *for IDs A, B, and C. Does that make sense? Thanks again! Best. Milu results <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 ), sum = c(NA, NA, NA, 0.000977171, NA, NA, NA, 0.001054089, NA, NA, NA, 0.001213399, NA, NA, NA, 0.001395514, NA, NA, NA, 0.001541206, NA, NA, NA, 0.001686898)), class = "data.frame", row.names = c(NA, -24L)) On Sun, Dec 19, 2021 at 7:50 PM Peter Langfelder wrote: > I'm not sure I understand the task, but if I do, assuming your data > frame is assigned to a variable named df, I would do something like > > sumNs = function(x, n) > { >if (length(x) %%n !=0) stop("Length of 'x' must be a multiple of 'n'.") >n1 = length(x)/n >ind = rep(1:n1, each = n) >tapply(x, ind, sum) > } > sums = tapply(df$Value, df$ID, sumNs, 4) > > Peter > > On Sun, Dec 19, 2021 at 10:32 AM Miluji Sb wrote: > > > > Dear all, > > > > I have a dataset (below) by ID and time sequence. I would like to sum > every > > four observations by ID. > > > > I am confused how to combine the two conditions. Any help will be highly > > appreciated. Thank you! > > > > Best. > > > > Milu > > > > ## Dataset > > structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", > > "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", > > "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, > > 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, > > 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L > > ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, > > 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, > > 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, > > 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, > > 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 > > )), class = "data.frame", row.names = c(NA, -24L)) > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum every n (4) observations by group
I'm not sure I understand the task, but if I do, assuming your data frame is assigned to a variable named df, I would do something like sumNs = function(x, n) { if (length(x) %%n !=0) stop("Length of 'x' must be a multiple of 'n'.") n1 = length(x)/n ind = rep(1:n1, each = n) tapply(x, ind, sum) } sums = tapply(df$Value, df$ID, sumNs, 4) Peter On Sun, Dec 19, 2021 at 10:32 AM Miluji Sb wrote: > > Dear all, > > I have a dataset (below) by ID and time sequence. I would like to sum every > four observations by ID. > > I am confused how to combine the two conditions. Any help will be highly > appreciated. Thank you! > > Best. > > Milu > > ## Dataset > structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", > "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", > "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, > 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, > 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L > ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, > 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, > 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, > 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, > 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 > )), class = "data.frame", row.names = c(NA, -24L)) > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sum every n (4) observations by group
Dear all, I have a dataset (below) by ID and time sequence. I would like to sum every four observations by ID. I am confused how to combine the two conditions. Any help will be highly appreciated. Thank you! Best. Milu ## Dataset structure(list(ID = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C", "C", "C"), Date = c(4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L, 4140L, 4141L, 4142L, 4143L, 4144L, 4145L, 4146L, 4147L ), Value = c(0.000207232, 0.000240141, 0.000271414, 0.000258384, 0.00024364, 0.00027148, 0.000280585, 0.000289691, 0.000298797, 0.000307903, 0.000317008, 0.000326114, 0.00033522, 0.000344326, 0.000353431, 0.000362537, 0.000371643, 0.000380749, 0.000389854, 0.00039896, 0.000408066, 0.000417172, 0.000426277, 0.000435383 )), class = "data.frame", row.names = c(NA, -24L)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in list.files(full.names=T)
I don't know the answer to your question, but I see the same behaviour on MacOS, e.g. list.files("./") includes ".//R" in the results on my system. Both "./R" and ".//R" are legal ways to express that path on MacOS, so it's not a serious bug, but it does look ugly. Duncan Murdoch On 18/12/2021 9:55 a.m., Mario Reutter wrote: Dear everybody, I'm a researcher in the field of psychology and a passionate R user. After having updated to the newest version, I experienced a problem with list.files() if the parameter full.names is set to TRUE. A path separator "/" is now always appended to path in the output even if path %>% endsWith("/"). This breaks backwards compatibility in case path ends with a path separator. The problem occurred somewhere between R version 3.6.1 (2019-07-05) and 4.1.2 (2021-11-01). Example: list.files("C:/Data/", full.names=T) C:/Data//file.csv Expected behavior: Either a path separator should never be appended in accordance with the documentation: "full.names a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path." Or it could only be appended if path doesn't already end with a path separator. My question would now be if this warrants a bug report? And if you agree, could someone issue the report since I'm not a member on Bugzilla? Thank you and best regards, Mario Reutter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Speed up studentized confidence intervals ?
Dear R-experts, Here below my R code working but really really slowly ! I need 2 hours with my computer to finally get an answer ! Is there a way to improve my R code to speed it up ? At least to win 1 hour ;=) Many thanks library(boot) s<- sample(178:798, 10, replace=TRUE) mean(s) N <- 1000 out <- replicate(N, { a<- sample(s,size=5) mean(a) dat<-data.frame(a) med<-function(d,i) { temp<-d[i,] f<-mean(temp) g<-var(replicate(50,mean(sample(temp,replace=T return(c(f,g)) } boot.out <- boot(data = dat, statistic = med, R = 1) boot.ci(boot.out, type = "stud")$stud[, 4:5] }) mean(out[1,] < mean(s) & mean(s) < out[2,]) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug in list.files(full.names=T)
Dear everybody, I'm a researcher in the field of psychology and a passionate R user. After having updated to the newest version, I experienced a problem with list.files() if the parameter full.names is set to TRUE. A path separator "/" is now always appended to path in the output even if path %>% endsWith("/"). This breaks backwards compatibility in case path ends with a path separator. The problem occurred somewhere between R version 3.6.1 (2019-07-05) and 4.1.2 (2021-11-01). Example: >> list.files("C:/Data/", full.names=T) C:/Data//file.csv Expected behavior: Either a path separator should never be appended in accordance with the documentation: "full.names a logical value. If TRUE, the directory path is prepended to the file names to give a relative file path." Or it could only be appended if path doesn't already end with a path separator. My question would now be if this warrants a bug report? And if you agree, could someone issue the report since I'm not a member on Bugzilla? Thank you and best regards, Mario Reutter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.