Thanks, I'll try this as well.

Srecko


On Thu, Aug 29, 2013 at 3:26 PM, arun <smartpink...@yahoo.com> wrote:

>
>
> Hi Srecko,
> Try this:
> dat1<- read.table(text="
> id module  event       time time_on_task Categ    url
> 1    sys  login 1373502892           80     B         http://
> 2   task    add 1373502892           80     A
> http://post/add?id=33&idp=67
> 3   task    add 1373502972           23     A
> http://post/add?id=34&idp=67
> 4    sys  login 1373502892           80     B          http://
> 5   list delete 1373502995          901     C          http://
> 6   list   view 1373503896          100     D           http://
> 7   task    add 1373503996           NA     A
> http://post/add?id=35&idp=99
> ",sep="",header=TRUE,stringsAsFactors=FALSE)
>
> vec1<-as.numeric(gsub(".*\\?.*=(\\d+)\\&.*","\\1",dat1$url[dat1$Categ=="A"]))
>
> dat2<- read.table(text="
> id idpost idtopic iduser
> 1   45      33       101
> 2   46      34       102
> 3   47      33       103
> 4   48      33       101
> 5   49      35       104
> ",sep="",header=TRUE)
>  student_list<- c(101:102,104:107)
>  vec2<-with(dat2,tapply(iduser,list(idtopic),FUN=function(x) all(x%in%
> student_list)))
>
> dat1$Categ[dat1$Categ=="A"][match(vec1,as.numeric(names(vec2)))[!vec2]]<-"F"
>  dat1
> #  id module  event       time time_on_task Categ
> url
> #1  1    sys  login 1373502892           80     B
> http://
> #2  2   task    add 1373502892           80     F
> http://post/add?id=33&idp=67
> #3  3   task    add 1373502972           23     A
> http://post/add?id=34&idp=67
> #4  4    sys  login 1373502892           80     B
> http://
> #5  5   list delete 1373502995          901     C
> http://
> #6  6   list   view 1373503896          100     D
> http://
> #7  7   task    add 1373503996           NA     A
> http://post/add?id=35&idp=99
>
> A.K.
>
> ________________________________
> From: srecko joksimovic <sreckojoksimo...@gmail.com>
> To: arun <smartpink...@yahoo.com>
> Sent: Thursday, August 29, 2013 6:04 PM
> Subject: Re: [R] Add new calculated column to data frame
>
>
>
> "Did you mean to separate the number 33 from the link? ", yes that is
> correct. It should be something like this:
>
>
> #  id module  event       time time_on_task Categ    url
> #1  1    sys  login 1373502892           80     B         http://
> #2  2   task    add 1373502892           80     A
> http://post/add?id=33&idp=67
> #3  3   task    add 1373502972           23     A
> http://post/add?id=34&idp=67
> #4  4    sys  login 1373502892           80     B          http://
>
> #5  5   list delete 1373502995          901     C          http://
> #6  6   list   view 1373503896          100     D           http://
> #7  7   task    add 1373503996           NA     A
> http://post/add?id=35&idp=99
>
> from this table I should get 3 rows with 3 URLs:
> http://post/add?id=33&idp=67, http://post/add?id=34&idp=67, and
> http://post/add?id=35&idp=99
> For each of them, I need to extract id (33, 34, and 35). Once I do that, I
> need to obtain users from this table:
> id idpost idtopic iduser
> 1   45      33       101
> 2   46      34       102
>
> 3   47      33       103
>
> 4   48      33       101
>
> 5   49      35       104
>
> again, for each id. This means:
> id = 33 => 101, 103
> id = 34 => 102
>
> id = 35 => 104
>
>
> Next, for each vector I need to check whether or not all it's values are
> in the students list (101,102, 104,105, 106,107)
>
> id = 33 => FALSE (since 103 is not in the list)
> id = 34 => TRUE
>
> id = 35 => TRUE
>
>
> This means that category for row 2 in the first table is not A any more,
> but F...
>
> Thanks,
> Srecko
>
>
>
>
>
> On Thu, Aug 29, 2013 at 2:56 PM, arun <smartpink...@yahoo.com> wrote:
>
> HI Srecko,
> >Did you mean to separate the number 33 from the link? Could you provide a
> reproducible example with the output you expected?
> >Tx.
> >
> >
> >Arun
> >
> >
> >
> >
> >
> >________________________________
> >From: srecko joksimovic <sreckojoksimo...@gmail.com>
> >To: arun <smartpink...@yahoo.com>
> >Sent: Thursday, August 29, 2013 5:38 PM
> >
> >Subject: Re: [R] Add new calculated column to data frame
> >
> >
> >
> >Hi Arun,
> >
> >I really appreciate your help, and we did a great job :)
> >but, now I think that R can do anything, so I'd like to try one more
> thing, if you don't mind...
> >
> >from the table with categories,
> >
> >#  id module  event       time time_on_task Categ    url
> >#1  1    sys  login 1373502892           80     B         http:
> >#2  2   task    add 1373502892           80     A         http:
> >#3  3   task    add 1373502972           23     A         http:
> >#4  4    sys  login 1373502892           80     B          http:
> >#5  5   list delete 1373502995          901     C
> >#6  6   list   view 1373503896          100     D
> >#7  7   task    add 1373503996           NA     A
> >
> >
> >I'd like to use only certain category (for example A). Each of these
> fields has an url whose format is something like
> http://post/add?id=33&idp=45. First step would be to extract this id (33
> in this case). Based on that value, I want to find all "iduser" from the
> following table:
> >
> >id idpost idtopic iduser
> >1   45      33       101
> >2   46      34       102
> >
> >3   47      33       103
> >
> >4   48      33       101
> >
> >5   49      35       104
> >
> >
> >The next step would be to check if at least one of these values (iduser)
> is not in the vectors "users" (only ids). If that is the case, I want to
> change category to F, if not, I want to keep the same category.
> >
> >If this is too much for one question, I'll implement this in Java, but
> I'd really like to try this with R. Maybe this id extraction from url is
> the most important problem... I tried most of these steps, but still not
> able to put them all together...
> >
> >Thank you so much for your time.
> >Srecko
> >
> >
> >
> >
> >
> >
> >
> >
> >On Thu, Aug 29, 2013 at 12:22 PM, arun <smartpink...@yahoo.com> wrote:
> >
> >Hi Srecko,
> >>No problem.
> >>
> >>Arun
> >>
> >>
> >>
> >>
> >>
> >>
> >>________________________________
> >>From: srecko joksimovic <sreckojoksimo...@gmail.com>
> >>To: arun <smartpink...@yahoo.com>
> >>Sent: Thursday, August 29, 2013 3:19 PM
> >>
> >>Subject: Re: [R] Add new calculated column to data frame
> >>
> >>
> >>
> >>This is great Arun, thank you again.
> >>
> >>I was thinking to use sqldf and issue query for each module-action
> combination, but this is much better. Since I have table with categories
> (module, action, category), I could create vector "levels" based on the
> first two columns and vector "labels" based on the category column and that
> should to the work...
> >>
> >>Best,
> >>Srecko
> >>
> >>
> >>
> >>On Thu, Aug 29, 2013 at 12:16 PM, arun <smartpink...@yahoo.com> wrote:
> >>
> >>Hi Srecko,
> >>>
> >>>You didn't mention the order in which the letters are assigned.  If you
> need a different order, just change the order in the ",levels=c(....),".
> >>>Arun
> >>>
> >>>
> >>>
> >>>
> >>>----- Original Message -----
> >>>From: arun <smartpink...@yahoo.com>
> >>>To: srecko joksimovic <sreckojoksimo...@gmail.com>
> >>>Cc: R help <r-help@r-project.org>
> >>>
> >>>Sent: Thursday, August 29, 2013 3:13 PM
> >>>Subject: Re: [R] Add new calculated column to data frame
> >>>
> >>>
> >>>
> >>>Hi,
> >>>You could try this:
> >>>dat1<- read.table(text="
> >>>id  module    event       time                       time_on_task
> >>>1   sys         login         1373502892           80
> >>>2   task        add          1373502892           80
> >>>3   task        add          1373502972           23
> >>>4   sys         login         1373502892           80
> >>>5   list         delete       1373502995          901
> >>>6   list          view         1373503896          100
> >>>7   task        add          1373503996           NA
> >>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>
> >>> dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4]))
> >>>
> >>>
> >>>dat1
> >>>#  id module  event       time time_on_task Categ
> >>>#1  1    sys  login 1373502892           80     B
> >>>#2  2   task    add 1373502892           80     A
> >>>#3  3   task    add 1373502972           23     A
> >>>#4  4    sys  login 1373502892           80     B
> >>>#5  5   list delete 1373502995          901     C
> >>>#6  6   list   view 1373503896          100     D
> >>>#7  7   task    add 1373503996           NA     A
> >>>A.K.
> >>>
> >>>________________________________
> >>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
> >>>To: arun <smartpink...@yahoo.com>
> >>>Cc: R help <R-help@r-project.org>
> >>>Sent: Thursday, August 29, 2013 2:34 PM
> >>>Subject: Re: [R] Add new calculated column to data frame
> >>>
> >>>
> >>>
> >>>Hi Arun,
> >>>
> >>>There is one more question... you explained me how to
> use split(dat1,cumsum(dat1$action=="login")) in one of previous questions,
> and that is great.
> >>>Now, if I have something like this:
> >>>
> >>>id  module    event       time                       time_on_task
> >>>1   sys         login         1373502892           80
> >>>2   task        add          1373502892           80
> >>>
> >>>3   task        add          1373502972           23
> >>>4   sys         login         1373502892           80
> >>>5   list         delete       1373502995          901
> >>>6   list          view         1373503896          100
> >>>7   task        add          1373503996           NA
> >>>I know how to split at each "login" occurrence, and I know how to add
> new column with time differences. But, how to add new column "category"
> which will be calculated based on columns "module" and "even"? For example
> if module=task and event=add => category= A...
> >>>
> >>>Srecko
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink...@yahoo.com> wrote:
> >>>
> >>>Hi Srecko,
> >>>>No problem.
> >>>>Regards,
> >>>>Arun
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>________________________________
> >>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
> >>>>To: arun <smartpink...@yahoo.com>
> >>>>Sent: Thursday, August 29, 2013 2:22 PM
> >>>>
> >>>>Subject: Re: [R] Add new calculated column to data frame
> >>>>
> >>>>
> >>>>
> >>>>Sorry... I should figure it out...
> >>>>
> >>>>thanks so much!
> >>>>Srecko
> >>>>
> >>>>
> >>>>
> >>>>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink...@yahoo.com> wrote:
> >>>>
> >>>>Hi,
> >>>>>The one you showed is:
> >>>>>
> >>>>>dat1$time_on_task<- c(diff(dat1$time),NA)
> >>>>>
> >>>>> dat1
> >>>>>#  id  event       time time_on_task
> >>>>>#1  1    add 1373502892           80
> >>>>>
> >>>>>#2  2    add 1373502972           23
> >>>>>#3  3 delete 1373502995          901
> >>>>>#4  4   view 1373503896          100
> >>>>>#5  5    add 1373503996           NA
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>________________________________
> >>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
> >>>>>
> >>>>>To: arun <smartpink...@yahoo.com>
> >>>>>Cc: R help <r-help@r-project.org>
> >>>>>Sent: Thursday, August 29, 2013 2:15 PM
> >>>>>Subject: Re: [R] Add new calculated column to data frame
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>Thanks Arun,
> >>>>>
> >>>>>this is great. However, it should be just a little bit different:
> >>>>>
> >>>>>#  id  event       time time_on_task
> >>>>>#1  1    add 1373502892           80
> >>>>>#2  2    add 1373502972           23
> >>>>>#3  3 delete 1373502995           901
> >>>>>#4  4   view 1373503896          100
> >>>>>#5  5    add 1373503996          NA
> >>>>>
> >>>>>
> >>>>>When I calculate difference, I need to know how long each activity
> was. It is id2-id1 for the first activity...
> >>>>>
> >>>>>
> >>>>>
> >>>>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink...@yahoo.com>
> wrote:
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>Hi,
> >>>>>>Try:
> >>>>>>dat1<- read.table(text="
> >>>>>>id    event    time
> >>>>>>
> >>>>>>1    add      1373502892
> >>>>>>2    add      1373502972
> >>>>>>3    delete  1373502995
> >>>>>>4    view      1373503896
> >>>>>>5    add      1373503996
> >>>>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
> >>>>>> dat1$time_on_task<- c(NA,diff(dat1$time))
> >>>>>> dat1
> >>>>>>#  id  event       time time_on_task
> >>>>>>#1  1    add 1373502892           NA
> >>>>>>#2  2    add 1373502972           80
> >>>>>>#3  3 delete 1373502995           23
> >>>>>>#4  4   view 1373503896          901
> >>>>>>#5  5    add 1373503996          100
> >>>>>>
> >>>>>>#Not sure whether this depends on the values of "event" or not..
> >>>>>>A.K.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>----- Original Message -----
> >>>>>>From: srecko joksimovic <sreckojoksimo...@gmail.com>
> >>>>>>To: R help <R-help@r-project.org>
> >>>>>>Cc:
> >>>>>>Sent: Thursday, August 29, 2013 1:52 PM
> >>>>>>Subject: [R] Add new calculated column to data frame
> >>>>>>
> >>>>>>Hi,
> >>>>>>
> >>>>>>I have a following data set:
> >>>>>>id    event    time (in sec)
> >>>>>>1     add      1373502892
> >>>>>>2     add      1373502972
> >>>>>>3     delete   1373502995
> >>>>>>4     view      1373503896
> >>>>>>5     add       1373503996
> >>>>>>...
> >>>>>>
> >>>>>>I'd like to add new column "time on task" which is time elapsed
> between two
> >>>>>>events (id2 - id1...). What would be the best approach to do that?
> >>>>>>
> >>>>>>Thanks,
> >>>>>>Srecko
> >>>>>>
> >>>>>>    [[alternative HTML version deleted]]
> >>>>>>
> >>>>>>______________________________________________
> >>>>>>R-help@r-project.org mailing list
> >>>>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>>>>and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to