Hi everyone. I tried the (modeest) package on my initial test data and it worked. However, it doesn't work on the entire data set. I saved one of the protions that gives error. (Not for all of the values but for some of them). For example: lines 36 and 37 and 39 correctly show the mode value but 38 and 40 are not correct. Such error is repeated for many of the values.
[36,] 2 [37,] 2 [38,] Numeric,3 [39,] 1 [40,] Numeric,3 ============================================ #This is what I did: > df<- read.csv(file="Part1-modif.csv", head=TRUE, sep=",") > Out<- apply(df[,2:length(df)],1, mfv) > t(t(Out)) #This is the data set structure(list(terms = structure(c(2L, 4L, 4L, 4L, 3L, 1L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("#authentication,access control", "#privacy,personal data", "#security,malicious,security", "data controller", "id management,security", "password,recovery"), class = "factor"), class.1 = c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L), class.2 = c(2L, 2L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L), class.3 = c(2L, 0L, 2L, 2L, 1L, 1L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("terms", "class.1", "class.2", "class.3"), class = "data.frame", row.names = c(NA, -50L)) ======================================================== also when I try to include the terms to the result it gives me an error: > mode.names<- data.frame (df[,1],Out) Error in data.frame(df[, 1], Out) : arguments imply differing number of rows: 50, 3 On Thu, May 28, 2015 at 9:24 AM, Mohammad Alimohammadi < mxalimoha...@ualr.edu> wrote: > Thank you David for your help ! > > On Wed, May 27, 2015 at 7:31 PM, David L Carlson <dcarl...@tamu.edu> > wrote: > >> cat(paste0("[", 1:length(Out), "] #dac ", Out), sep="\n") >> >> David >> >> *From:* Mohammad Alimohammadi [mailto:mxalimoha...@ualr.edu] >> *Sent:* Wednesday, May 27, 2015 2:29 PM >> *To:* David L Carlson; r-help@r-project.org >> >> *Subject:* Re: [R] Problem with comparing multiple data sets >> >> >> >> Thanks David it worked ! >> >> >> >> One more thing. I hope it's not complicated. Is it also possible to >> display the terms for each row next to it? >> >> >> >> for example: >> >> >> >> [1] #dac 2 >> >> [2] #dac 0 >> >> [3] #dac 1 >> >> ... >> >> >> >> >> >> >> >> >> >> On Wed, May 27, 2015 at 2:18 PM, David L Carlson <dcarl...@tamu.edu> >> wrote: >> >> Save the result of the apply() function: >> >> Out <- apply(df[ ,2:length(df)], 1, mfv) >> >> Then there are several options: >> >> Approximately what you asked for >> data.frame(Out) >> t(t(Out)) >> >> More typing but exactly what you asked for >> cat(paste0("[", 1:length(Out), "] ", Out), sep="\n") >> >> >> David L. Carlson >> Department of Anthropology >> Texas A&M University >> >> >> >> -----Original Message----- >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mohammad >> Alimohammadi >> Sent: Wednesday, May 27, 2015 1:47 PM >> To: John Kane; r-help@r-project.org >> Subject: Re: [R] Problem with comparing multiple data sets >> >> Ok. so I read about the ("modeest") package that gives the results that I >> am looking for (most repeated value). >> >> I modified the data frame a little and moved the text to the first column. >> This is the data frame with all 3 possible classes for each term. >> >> ================================= >> structure(list(terms = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, >> 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("#dac", >> "#mac,#security", >> "accountability,anonymous", "data security,encryption,security" >> ), class = "factor"), class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, >> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, >> 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), class.2 = c(2L, 2L, >> 2L, 2L, 0L, 0L, 2L, 0L, 0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, >> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, >> 0L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), >> class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, 1L, >> 0L, 0L, 0L, 0L, 2L, 1L, 2L)), .Names = c("terms", "class.1", >> "class.2", "class.3"), class = "data.frame", row.names = c(NA, >> -49L)) >> ============================================= >> #Then I applied the function below: >> >> ====================== >> library(modeest) >> df<- read.csv(file="short.csv", head= TRUE, sep=",") >> apply(df[ ,2:length(df)], 1, mfv) >> >> ============================ >> # It gives the most frequent value for each row which is what I need. The >> only problem is that all the values are displayed in one single row. >> >> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 0 0 2 1 1 1 1 0 0 0 0 2 1 2 >> >> It would be much better to show them in separate rows. >> For example: >> >> [1] 0 >> >> [2] 0 >> >> [3] 1 >> .... >> >> Any idea how to do this? >> >> >> >> On Wed, May 27, 2015 at 10:11 AM, Mohammad Alimohammadi < >> mxalimoha...@ualr.edu> wrote: >> >> > Hi Jim, >> > >> > Thank you for your advice. >> > >> > I'm not sure how to exactly incorporate this function though. I added a >> > portion of the actual data sets. all 3 data sets have the same items >> (text) >> > with different class values. So I need to assign the most repeated class >> > (0,1,2) for each text. >> > >> > For example: if line1 has text "aaa". It may be assigned to class 0 in >> > dat1, 2 in dat 2 and 0 in dat3. in this case the "aaa" will be assigned >> to >> > 0 (most repeated value). So it goes for each text. >> > >> > I really appreciate your help. >> > >> > ========================================= >> > >> > *dat1* >> > >> > structure(list(class.1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, >> > 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L), terms = structure(c(1L, 1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = >> > c("#dac", >> > "#mac,#security", "accountability,anonymous", "data >> > security,encryption,security" >> > ), class = "factor")), .Names = c("class.1", "terms"), class = >> > "data.frame", row.names = c(NA, >> > -49L)) >> > >> > >> > *dat2* >> > >> > structure(list(class.2 = c(2L, 2L, 2L, 2L, 0L, 0L, 2L, 0L, 0L, >> > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 2L, 2L, 2L, 1L, 1L, 2L, >> > 2L, 0L, 0L, 0L, 0L, 1L, 1L, 1L), terms = structure(c(1L, 1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = >> > c("#dac", >> > "#mac,#security", "accountability,anonymous", "data >> > security,encryption,security" >> > ), class = "factor")), .Names = c("class.2", "terms"), class = >> > "data.frame", row.names = c(NA, >> > -49L)) >> > >> > >> > *dat3* >> >> > >> > structure(list(class.3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, >> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 1L, 1L, 1L, >> > 1L, 0L, 0L, 0L, 0L, 2L, 1L, 2L), terms = structure(c(1L, 1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, >> > 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = >> > c("#dac", >> > "#mac,#security", "accountability,anonymous", "data >> > security,encryption,security" >> > ), class = "factor")), .Names = c("class.3", "terms"), class = >> > "data.frame", row.names = c(NA, >> > -49L)) >> > >> > =========================================================== >> > >> > >> > On Sun, May 24, 2015 at 1:15 AM, Jim Lemon <drjimle...@gmail.com> >> wrote: >> > >> >> Hi Mohammad, >> >> You know, I thought this would be fairly easy, but it wasn't really. >> >> >> >> df1<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"), >> >> Term=c("aac","aax","vvx"),Text=c("text1","text2","text3")) >> >> df2<-data.frame(Class=c(0,2,1),Comment=c("com1","com2","com3"), >> >> Term=c("aac","aax","vvx"),Text=c("text1","text2","text3")) >> >> df3<-data.frame(Class=c(2,1,0),Comment=c("com1","com2","com3"), >> >> Term=c("aac","aax","vvx"),Text=c("text1","text2","text3")) >> >> dflist<-list(df1,df2,df3) >> >> dflist >> >> >> >> # define a function that extracts the value from one field >> >> # selected by a value in another field >> >> extract_by_value<-function(x,field1,value1,field2) { >> >> return(x[x[,field1]==value1,field2]) >> >> } >> >> >> >> # define another function that equates all of the values >> >> sub_value<-function(x,field1,value1,field2,value2) { >> >> x[x[,field1]==value1,field2]<-value2 >> >> return(x) >> >> } >> >> >> >> conformity<-function(x,fieldname1,value1,fieldname2) { >> >> # get the most frequent value in fieldname2 >> >> # for the desired value in fieldname1 >> >> most_freq<-as.numeric(names(which.max(table(unlist(lapply(x, >> >> extract_by_value,fieldname1,value1,fieldname2)))))) >> >> # now set all the values to the most frequent >> >> for(i in 1:length(x)) >> >> x[[i]]<-sub_value(x[[i]],fieldname1,value1,fieldname2,most_freq) >> >> return(x) >> >> } >> >> >> >> conformity(dflist,"Text","text1","Class") >> >> >> >> Jim >> >> >> >> On Sat, May 23, 2015 at 11:23 PM, John Kane <jrkrid...@inbox.com> >> wrote: >> >> > Hi Mohammad >> >> > >> >> > Welcome to the R-help list. >> >> > >> >> > There probably is a fairly easy way to what you want but I think we >> >> probably need a bit more background information on what you are trying >> to >> >> achieve. I know I'm not exactly clear on your decision rule(s). >> >> > >> >> > It would also be very useful to see some actual sample data in >> useable >> >> R format.Have a look at these links >> >> >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> >> and http://adv-r.had.co.nz/Reproducibility.html for some hints on what >> >> you might want to include in your question. >> >> > >> >> > In particular, read up about dput() in those links and/or see ?dput. >> >> This is the generally preferred way to supply sample or illustrative >> data >> >> to the R-help list. It basically creates a perfect copy of the data >> as it >> >> exists on 'your' machine so that R-help readers see exactly what you >> do. >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > John Kane >> >> > Kingston ON Canada >> >> > >> >> > >> >> >> -----Original Message----- >> >> >> From: mxalimoha...@ualr.edu >> >> >> Sent: Fri, 22 May 2015 12:37:50 -0500 >> >> >> To: r-help@r-project.org >> >> >> Subject: [R] Problem with comparing multiple data sets >> >> >> >> >> >> Hi everyone, >> >> >> >> >> >> I am very new to R and I have a task to do. I appreciate any help. I >> >> have >> >> >> 3 >> >> >> data sets. Each data set has 4 columns. For example: >> >> >> >> >> >> Class Comment Term Text >> >> >> 0 com1 aac text1 >> >> >> 2 com2 aax text2 >> >> >> 1 com3 vvx text3 >> >> >> >> >> >> Now I need t compare the class section between 3 data sets and >> assign >> >> the >> >> >> most available class to that text. For example if text1 is assigned >> to >> >> >> class 0 in data set 1&2 but assigned as 2 in data set 3 then it >> should >> >> be >> >> >> assigned to class 0. If they are all the same so the class will be >> the >> >> >> same. The ideal thing would be to keep the same format and just >> update >> >> >> the >> >> >> class. Is there any easy way to do this? >> >> >> >> >> >> Thanks a lot. >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> >> >> ______________________________________________ >> >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guide >> >> >> http://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> > >> >> > ____________________________________________________________ >> >> > FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! >> >> > >> >> > ______________________________________________ >> >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > >> > >> > -- >> > Mohammad Alimohammadi | Graduate Assistant >> > University of Arkansas at Little Rock | College of Science and >> Mathematics >> > (CSAM) >> > | mxalimoha...@ualr.edu | ualr.edu >> > >> > Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ >> > >> >> >> -- >> Mohammad Alimohammadi | Graduate Assistant >> University of Arkansas at Little Rock | College of Science and Mathematics >> (CSAM) >> 501.346.8007 | mxalimoha...@ualr.edu | ualr.edu >> >> Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> >> -- >> >> Mohammad Alimohammadi | Graduate Assistant >> >> University of Arkansas at Little Rock | College of Science >> and Mathematics (CSAM) >> >> 501.346.8007 | mxalimoha...@ualr.edu | ualr.edu >> >> >> >> Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ >> >> > > > -- > Mohammad Alimohammadi | Graduate Assistant > University of Arkansas at Little Rock | College of Science and Mathematics > (CSAM) > 501.346.8007 | mxalimoha...@ualr.edu | ualr.edu > > Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ > -- Mohammad Alimohammadi | Graduate Assistant University of Arkansas at Little Rock | College of Science and Mathematics (CSAM) 501.346.8007 | mxalimoha...@ualr.edu | ualr.edu Public URL: http://scholar.google.com/citations?user=MsfN_i8AAAAJ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.