Thanks, I'll try this as well. Srecko
On Thu, Aug 29, 2013 at 3:26 PM, arun <smartpink...@yahoo.com> wrote: > > > Hi Srecko, > Try this: > dat1<- read.table(text=" > id module event time time_on_task Categ url > 1 sys login 1373502892 80 B http:// > 2 task add 1373502892 80 A > http://post/add?id=33&idp=67 > 3 task add 1373502972 23 A > http://post/add?id=34&idp=67 > 4 sys login 1373502892 80 B http:// > 5 list delete 1373502995 901 C http:// > 6 list view 1373503896 100 D http:// > 7 task add 1373503996 NA A > http://post/add?id=35&idp=99 > ",sep="",header=TRUE,stringsAsFactors=FALSE) > > vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"])) > > dat2<- read.table(text=" > id idpost idtopic iduser > 1 45 33 101 > 2 46 34 102 > 3 47 33 103 > 4 48 33 101 > 5 49 35 104 > ",sep="",header=TRUE) > student_list<- c(101:102,104:107) > vec2<-with(dat2,tapply(iduser,list(idtopic),FUN=function(x) all(x%in% > student_list))) > > dat1$Categ[dat1$Categ=="A"][match(vec1,as.numeric(names(vec2)))[!vec2]]<-"F" > dat1 > # id module event time time_on_task Categ > url > #1 1 sys login 1373502892 80 B > http:// > #2 2 task add 1373502892 80 F > http://post/add?id=33&idp=67 > #3 3 task add 1373502972 23 A > http://post/add?id=34&idp=67 > #4 4 sys login 1373502892 80 B > http:// > #5 5 list delete 1373502995 901 C > http:// > #6 6 list view 1373503896 100 D > http:// > #7 7 task add 1373503996 NA A > http://post/add?id=35&idp=99 > > A.K. > > ________________________________ > From: srecko joksimovic <sreckojoksimo...@gmail.com> > To: arun <smartpink...@yahoo.com> > Sent: Thursday, August 29, 2013 6:04 PM > Subject: Re: [R] Add new calculated column to data frame > > > > "Did you mean to separate the number 33 from the link? ", yes that is > correct. It should be something like this: > > > # id module event time time_on_task Categ url > #1 1 sys login 1373502892 80 B http:// > #2 2 task add 1373502892 80 A > http://post/add?id=33&idp=67 > #3 3 task add 1373502972 23 A > http://post/add?id=34&idp=67 > #4 4 sys login 1373502892 80 B http:// > > #5 5 list delete 1373502995 901 C http:// > #6 6 list view 1373503896 100 D http:// > #7 7 task add 1373503996 NA A > http://post/add?id=35&idp=99 > > from this table I should get 3 rows with 3 URLs: > http://post/add?id=33&idp=67, http://post/add?id=34&idp=67, and > http://post/add?id=35&idp=99 > For each of them, I need to extract id (33, 34, and 35). Once I do that, I > need to obtain users from this table: > id idpost idtopic iduser > 1 45 33 101 > 2 46 34 102 > > 3 47 33 103 > > 4 48 33 101 > > 5 49 35 104 > > again, for each id. This means: > id = 33 => 101, 103 > id = 34 => 102 > > id = 35 => 104 > > > Next, for each vector I need to check whether or not all it's values are > in the students list (101,102, 104,105, 106,107) > > id = 33 => FALSE (since 103 is not in the list) > id = 34 => TRUE > > id = 35 => TRUE > > > This means that category for row 2 in the first table is not A any more, > but F... > > Thanks, > Srecko > > > > > > On Thu, Aug 29, 2013 at 2:56 PM, arun <smartpink...@yahoo.com> wrote: > > HI Srecko, > >Did you mean to separate the number 33 from the link? Could you provide a > reproducible example with the output you expected? > >Tx. > > > > > >Arun > > > > > > > > > > > >________________________________ > >From: srecko joksimovic <sreckojoksimo...@gmail.com> > >To: arun <smartpink...@yahoo.com> > >Sent: Thursday, August 29, 2013 5:38 PM > > > >Subject: Re: [R] Add new calculated column to data frame > > > > > > > >Hi Arun, > > > >I really appreciate your help, and we did a great job :) > >but, now I think that R can do anything, so I'd like to try one more > thing, if you don't mind... > > > >from the table with categories, > > > ># id module event time time_on_task Categ url > >#1 1 sys login 1373502892 80 B http: > >#2 2 task add 1373502892 80 A http: > >#3 3 task add 1373502972 23 A http: > >#4 4 sys login 1373502892 80 B http: > >#5 5 list delete 1373502995 901 C > >#6 6 list view 1373503896 100 D > >#7 7 task add 1373503996 NA A > > > > > >I'd like to use only certain category (for example A). Each of these > fields has an url whose format is something like > http://post/add?id=33&idp=45. First step would be to extract this id (33 > in this case). Based on that value, I want to find all "iduser" from the > following table: > > > >id idpost idtopic iduser > >1 45 33 101 > >2 46 34 102 > > > >3 47 33 103 > > > >4 48 33 101 > > > >5 49 35 104 > > > > > >The next step would be to check if at least one of these values (iduser) > is not in the vectors "users" (only ids). If that is the case, I want to > change category to F, if not, I want to keep the same category. > > > >If this is too much for one question, I'll implement this in Java, but > I'd really like to try this with R. Maybe this id extraction from url is > the most important problem... I tried most of these steps, but still not > able to put them all together... > > > >Thank you so much for your time. > >Srecko > > > > > > > > > > > > > > > > > >On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink...@yahoo.com> wrote: > > > >Hi Srecko, > >>No problem. > >> > >>Arun > >> > >> > >> > >> > >> > >> > >>________________________________ > >>From: srecko joksimovic <sreckojoksimo...@gmail.com> > >>To: arun <smartpink...@yahoo.com> > >>Sent: Thursday, August 29, 2013 3:19 PM > >> > >>Subject: Re: [R] Add new calculated column to data frame > >> > >> > >> > >>This is great Arun, thank you again. > >> > >>I was thinking to use sqldf and issue query for each module-action > combination, but this is much better. Since I have table with categories > (module, action, category), I could create vector "levels" based on the > first two columns and vector "labels" based on the category column and that > should to the work... > >> > >>Best, > >>Srecko > >> > >> > >> > >>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink...@yahoo.com> wrote: > >> > >>Hi Srecko, > >>> > >>>You didn't mention the order in which the letters are assigned. If you > need a different order, just change the order in the ",levels=c(....),". > >>>Arun > >>> > >>> > >>> > >>> > >>>----- Original Message ----- > >>>From: arun <smartpink...@yahoo.com> > >>>To: srecko joksimovic <sreckojoksimo...@gmail.com> > >>>Cc: R help <r-help@r-project.org> > >>> > >>>Sent: Thursday, August 29, 2013 3:13 PM > >>>Subject: Re: [R] Add new calculated column to data frame > >>> > >>> > >>> > >>>Hi, > >>>You could try this: > >>>dat1<- read.table(text=" > >>>id module event time time_on_task > >>>1 sys login 1373502892 80 > >>>2 task add 1373502892 80 > >>>3 task add 1373502972 23 > >>>4 sys login 1373502892 80 > >>>5 list delete 1373502995 901 > >>>6 list view 1373503896 100 > >>>7 task add 1373503996 NA > >>>",sep="",header=TRUE,stringsAsFactors=FALSE) > > >>> dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4])) > >>> > >>> > >>>dat1 > >>># id module event time time_on_task Categ > >>>#1 1 sys login 1373502892 80 B > >>>#2 2 task add 1373502892 80 A > >>>#3 3 task add 1373502972 23 A > >>>#4 4 sys login 1373502892 80 B > >>>#5 5 list delete 1373502995 901 C > >>>#6 6 list view 1373503896 100 D > >>>#7 7 task add 1373503996 NA A > >>>A.K. > >>> > >>>________________________________ > >>>From: srecko joksimovic <sreckojoksimo...@gmail.com> > >>>To: arun <smartpink...@yahoo.com> > >>>Cc: R help <R-help@r-project.org> > >>>Sent: Thursday, August 29, 2013 2:34 PM > >>>Subject: Re: [R] Add new calculated column to data frame > >>> > >>> > >>> > >>>Hi Arun, > >>> > >>>There is one more question... you explained me how to > use split(dat1,cumsum(dat1$action=="login")) in one of previous questions, > and that is great. > >>>Now, if I have something like this: > >>> > >>>id module event time time_on_task > >>>1 sys login 1373502892 80 > >>>2 task add 1373502892 80 > >>> > >>>3 task add 1373502972 23 > >>>4 sys login 1373502892 80 > >>>5 list delete 1373502995 901 > >>>6 list view 1373503896 100 > >>>7 task add 1373503996 NA > >>>I know how to split at each "login" occurrence, and I know how to add > new column with time differences. But, how to add new column "category" > which will be calculated based on columns "module" and "even"? For example > if module=task and event=add => category= A... > >>> > >>>Srecko > >>> > >>> > >>> > >>> > >>> > >>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink...@yahoo.com> wrote: > >>> > >>>Hi Srecko, > >>>>No problem. > >>>>Regards, > >>>>Arun > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>________________________________ > >>>>From: srecko joksimovic <sreckojoksimo...@gmail.com> > >>>>To: arun <smartpink...@yahoo.com> > >>>>Sent: Thursday, August 29, 2013 2:22 PM > >>>> > >>>>Subject: Re: [R] Add new calculated column to data frame > >>>> > >>>> > >>>> > >>>>Sorry... I should figure it out... > >>>> > >>>>thanks so much! > >>>>Srecko > >>>> > >>>> > >>>> > >>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink...@yahoo.com> wrote: > >>>> > >>>>Hi, > >>>>>The one you showed is: > >>>>> > >>>>>dat1$time_on_task<- c(diff(dat1$time),NA) > >>>>> > >>>>> dat1 > >>>>># id event time time_on_task > >>>>>#1 1 add 1373502892 80 > >>>>> > >>>>>#2 2 add 1373502972 23 > >>>>>#3 3 delete 1373502995 901 > >>>>>#4 4 view 1373503896 100 > >>>>>#5 5 add 1373503996 NA > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>________________________________ > >>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com> > >>>>> > >>>>>To: arun <smartpink...@yahoo.com> > >>>>>Cc: R help <r-help@r-project.org> > >>>>>Sent: Thursday, August 29, 2013 2:15 PM > >>>>>Subject: Re: [R] Add new calculated column to data frame > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>Thanks Arun, > >>>>> > >>>>>this is great. However, it should be just a little bit different: > >>>>> > >>>>># id event time time_on_task > >>>>>#1 1 add 1373502892 80 > >>>>>#2 2 add 1373502972 23 > >>>>>#3 3 delete 1373502995 901 > >>>>>#4 4 view 1373503896 100 > >>>>>#5 5 add 1373503996 NA > >>>>> > >>>>> > >>>>>When I calculate difference, I need to know how long each activity > was. It is id2-id1 for the first activity... > >>>>> > >>>>> > >>>>> > >>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> > wrote: > >>>>> > >>>>> > >>>>>> > >>>>>>Hi, > >>>>>>Try: > >>>>>>dat1<- read.table(text=" > >>>>>>id event time > >>>>>> > >>>>>>1 add 1373502892 > >>>>>>2 add 1373502972 > >>>>>>3 delete 1373502995 > >>>>>>4 view 1373503896 > >>>>>>5 add 1373503996 > >>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE) > >>>>>> dat1$time_on_task<- c(NA,diff(dat1$time)) > >>>>>> dat1 > >>>>>># id event time time_on_task > >>>>>>#1 1 add 1373502892 NA > >>>>>>#2 2 add 1373502972 80 > >>>>>>#3 3 delete 1373502995 23 > >>>>>>#4 4 view 1373503896 901 > >>>>>>#5 5 add 1373503996 100 > >>>>>> > >>>>>>#Not sure whether this depends on the values of "event" or not.. > >>>>>>A.K. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>----- Original Message ----- > >>>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com> > >>>>>>To: R help <R-help@r-project.org> > >>>>>>Cc: > >>>>>>Sent: Thursday, August 29, 2013 1:52 PM > >>>>>>Subject: [R] Add new calculated column to data frame > >>>>>> > >>>>>>Hi, > >>>>>> > >>>>>>I have a following data set: > >>>>>>id event time (in sec) > >>>>>>1 add 1373502892 > >>>>>>2 add 1373502972 > >>>>>>3 delete 1373502995 > >>>>>>4 view 1373503896 > >>>>>>5 add 1373503996 > >>>>>>... > >>>>>> > >>>>>>I'd like to add new column "time on task" which is time elapsed > between two > >>>>>>events (id2 - id1...). What would be the best approach to do that? > >>>>>> > >>>>>>Thanks, > >>>>>>Srecko > >>>>>> > >>>>>> [[alternative HTML version deleted]] > >>>>>> > >>>>>>______________________________________________ > >>>>>>R-help@r-project.org mailing list > >>>>>>https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>>>>>and provide commented, minimal, self-contained, reproducible code. > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.