[R] LDA select number of topics

2015-11-08 Thread srecko joksimovic
Hi all,

I've seen recently this great post by Nikita Murzintcev
http://rpubs.com/nikita-moor/107657. If I understood correctly, according
to Griffiths (2004) I should select 11 topics? But, it seems that other
metrics suggest quite different number of topics?

I mean, 11 topics is about the right number, however, besides it works
better in my case, how do I know which metric to rely on? That is, if I
want to report this in a paper, can I simply say that I relied on Griffiths
(2004), without explaining why not Arun (2010), for example?

Thanks,


dda_topics.pdf
Description: Adobe PDF document
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regression model

2013-11-21 Thread srecko joksimovic
Hi,

I'm trying to fit regression model, but there is something wrong with it.
The dataset contains 85 observations for 85 students.Those observations are
counts of several actions, and dependent variable is final score. More
precisely, I have 5 IV and one DV. I'm trying to build regression model to
check whether those variables can predict the final score.

I'm attaching output of several steps, but I tried to following procedure:
- build model with only those two variables
- summary shows that non of them is significant predictor of the final
outcome.
- test for multicollinearity revealed tolerance below 0.2 (potential
problem)
- build two new models having as a predictor only one of those values
- both models show that variable used for the model is significant
predictor. Separately they are significant, together not. Probably
multicollinearity problem, but...
- as I keep adding other variables to one or the other model, Multiple
R-squared slightly increases.
- I tried to compare different models using anova, but non of them seems to
be better.

How to determine which model is better?

Thanks
 lm.all.1 - lm(mark~IA+IC, data=social_presence_data)
 summary(lm.all.1)

Call:
lm(formula = mark ~ IA + IC, data = social_presence_data)

Residuals:
Min  1Q  Median  3Q Max 
-3.5969 -0.2573  0.2599  0.5819  1.2955 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.789380.24599  11.339   2e-16 ***
IA   0.028440.04503   0.6320.530
IC   0.019790.02601   0.7610.449
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.031 on 79 degrees of freedom
Multiple R-squared:   0.12, Adjusted R-squared:  0.09774 
F-statistic: 5.387 on 2 and 79 DF,  p-value: 0.006407

 1/vif(lm.all.1)
   IAIC 
0.1719037 0.1719037 
 dwt(lm.all.1)
 lag Autocorrelation D-W Statistic p-value
   1  0.09176706  1.815883   0.372
 Alternative hypothesis: rho != 0
 lm.all.2 - lm(mark~IA, data=social_presence_data)
 lm.all.3 - lm(mark~IC, data=social_presence_data)
 anova(lm.all.2, lm.all.3)
Analysis of Variance Table

Model 1: mark ~ IA
Model 2: mark ~ IC
  Res.DfRSS Df Sum of Sq F Pr(F)
1 80 84.604  
2 80 84.413  0   0.19141 
 anova(lm.all.1, lm.all.3)
Analysis of Variance Table

Model 1: mark ~ IA + IC
Model 2: mark ~ IC
  Res.DfRSS Df Sum of Sq  F Pr(F)
1 79 83.989   
2 80 84.413 -1  -0.42402 0.3988 0.5295
 anova(lm.all.1, lm.all.2)
Analysis of Variance Table

Model 1: mark ~ IA + IC
Model 2: mark ~ IA
  Res.DfRSS Df Sum of Sq  F Pr(F)
1 79 83.989   
2 80 84.604 -1  -0.61543 0.5789  0.449
 summary(lm.all.2)

Call:
lm(formula = mark ~ IA, data = social_presence_data)

Residuals:
Min  1Q  Median  3Q Max 
-3.5409 -0.2539  0.2283  0.5793  1.2956 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.885170.21078  13.688   2e-16 ***
IA   0.059610.01862   3.202  0.00196 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.028 on 80 degrees of freedom
Multiple R-squared:  0.1136,Adjusted R-squared:  0.1025 
F-statistic: 10.25 on 1 and 80 DF,  p-value: 0.001962

 summary(lm.all.3)

Call:
lm(formula = mark ~ IC, data = social_presence_data)

Residuals:
Min  1Q  Median  3Q Max 
-3.6320 -0.2562  0.2590  0.5764  1.2585 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.763640.24168  11.435   2e-16 ***
IC   0.034730.01074   3.233  0.00178 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.027 on 80 degrees of freedom
Multiple R-squared:  0.1156,Adjusted R-squared:  0.1045 
F-statistic: 10.45 on 1 and 80 DF,  p-value: 0.001779

 lm.all.3.1 - lm(mark~IC+AU, data=social_presence_data)
 summary(lm.all.3.1)

Call:
lm(formula = mark ~ IC + AU, data = social_presence_data)

Residuals:
Min  1Q  Median  3Q Max 
-3.5951 -0.2618  0.2378  0.5907  1.2619 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  2.776000.24499  11.331   2e-16 ***
IC   0.032760.01191   2.752  0.00735 ** 
AU   0.049940.12697   0.393  0.69514
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.033 on 79 degrees of freedom
Multiple R-squared:  0.1173,Adjusted R-squared:  0.09496 
F-statistic: 5.249 on 2 and 79 DF,  p-value: 0.007236__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression model

2013-11-21 Thread srecko joksimovic
No, it's not homework, it's just some initial analysis, but still...
and thanks for recommendation.


On Thu, Nov 21, 2013 at 4:42 PM, Rolf Turner r.tur...@auckland.ac.nzwrote:


 (1) Is this homework?  (This list doesn't do homework for people!)
 (Animals maybe, but not people! :-) )

 (2) Your question isn't really an R question but rather a
 statistics/linear modelling
 question.  It is possible that you might get some insight from Frank
 Harrel's book
 Regression Modelling Strategies (Springer, 2001).

 cheers,

 Rolf Turner


 On 11/22/13 12:52, srecko joksimovic wrote:

 Hi,

 I'm trying to fit regression model, but there is something wrong with it.
 The dataset contains 85 observations for 85 students.Those observations
 are
 counts of several actions, and dependent variable is final score. More
 precisely, I have 5 IV and one DV. I'm trying to build regression model to
 check whether those variables can predict the final score.

 I'm attaching output of several steps, but I tried to following procedure:
 - build model with only those two variables
 - summary shows that non of them is significant predictor of the final
 outcome.
 - test for multicollinearity revealed tolerance below 0.2 (potential
 problem)
 - build two new models having as a predictor only one of those values
 - both models show that variable used for the model is significant
 predictor. Separately they are significant, together not. Probably
 multicollinearity problem, but...
 - as I keep adding other variables to one or the other model, Multiple
 R-squared slightly increases.
 - I tried to compare different models using anova, but non of them seems
 to
 be better.

 How to determine which model is better?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmerTest

2013-10-13 Thread srecko joksimovic
Thanks Uwe,
I wasn't quite sure about that one... when I build model with that
particular variable, that is what happen. have to check why...

Best,
Srecko


On Sun, Oct 13, 2013 at 5:45 AM, Uwe Ligges lig...@statistik.tu-dortmund.de
 wrote:



 On 13.10.2013 02:52, srecko joksimovic wrote:

 ok, ok... thanks.
 I'll try with R-sig-ME


 Or for short, you are trying to estimate more coefficients than you have
 degrees of freedom which is what

 rank of X = 1660  ncol(X) = 1895
 tries to tell us.

 Best,
 Uwe Ligges




 On Sat, Oct 12, 2013 at 5:43 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
 **wrote:

  Any idea what could be the problem? Hmmm...  posting in html? No
 reproducible example? Not posting on R-sig-ME? Just some ideas... reading
 the Posting Guide might be helpful to you.
 --**--**
 ---
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k
 --**--**
 ---
 Sent from my phone. Please excuse my brevity.

 srecko joksimovic sreckojoksimo...@gmail.com wrote:

 Hi,

 I'm trying to user lmer function from lmerTest package because, if I
 understood correectly, it allows to make better inference than lmer
 method

 from lme4 package. However, whatever I do I keep getting this error:


 Error in lme4::lFormula(formula = mark ~ ssCount + sTime+  :  rank of X
 =
 1660  ncol(X) = 1895

 any ideas what could be a problem?

 thanks,
 Srecko

[[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmerTest

2013-10-12 Thread srecko joksimovic
Hi,

I'm trying to user lmer function from lmerTest package because, if I
understood correectly, it allows to make better inference than lmer method
from lme4 package. However, whatever I do I keep getting this error:

Error in lme4::lFormula(formula = mark ~ ssCount + sTime+  :  rank of X =
1660  ncol(X) = 1895

any ideas what could be a problem?

thanks,
Srecko

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmerTest

2013-10-12 Thread srecko joksimovic
ok, ok... thanks.
I'll try with R-sig-ME


On Sat, Oct 12, 2013 at 5:43 PM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote:

 Any idea what could be the problem? Hmmm...  posting in html? No
 reproducible example? Not posting on R-sig-ME? Just some ideas... reading
 the Posting Guide might be helpful to you.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 srecko joksimovic sreckojoksimo...@gmail.com wrote:
 Hi,
 
 I'm trying to user lmer function from lmerTest package because, if I
 understood correectly, it allows to make better inference than lmer
 method
 from lme4 package. However, whatever I do I keep getting this error:
 
 Error in lme4::lFormula(formula = mark ~ ssCount + sTime+  :  rank of X
 =
 1660  ncol(X) = 1895
 
 any ideas what could be a problem?
 
 thanks,
 Srecko
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multilevel analysis

2013-09-30 Thread srecko joksimovic
I have an example of multilevel analysis with 3 levels, but data are
non-normally distributed. In case of normal distribution, I would perform
multilevel linear analysis using lme function, but what should I do in case
of non-normal distribution?

thanks,
Srecko

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multilevel analysis

2013-09-30 Thread srecko joksimovic
I thought so, but then I found this:
Normality

The assumption of normality states that the error terms at every level of
the model are normally distributed

maybe I misinterpreted something.


On Mon, Sep 30, 2013 at 3:06 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Sep 30, 2013, at 2:50 PM, srecko joksimovic wrote:

  I have an example of multilevel analysis with 3 levels, but data are
  non-normally distributed. In case of normal distribution, I would perform
  multilevel linear analysis using lme function, but what should I do in
 case
  of non-normal distribution?
 

 But normal distribution is not a requirement for linear models. Please
 review your theory.

  thanks,
  Srecko
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multilevel analysis

2013-09-30 Thread srecko joksimovic
Thanks for your comments, David and Bert.
The best would be to provide an example. Let's say we have a dataset like
this one:
IDEmployee Company OU CountViewPortal CountLogin TimeOnTask Performance
1 Company1 Company1.OU1 21 33 627.8 4.3
2 Company1 Company1.OU2 45 54 34.8 2.3
3 Company2 Company1.OU1 23 33 3.8 1.0
4 Company2 Company1.OU1 34 12 44.8 2.3
5 Company2 Company1.OU2 55 22 55.8 4.5
6 Company2 Company1.OU3 45 44 34.8 3

I want to see if there is correlation between CountViewPortal and
Performance. Moreover, I'd like to reveal the influence of
CountViewPortal+TimeOnTask on Performance.
However, I expect that employees within a OU, and than a Company have
similar behavior. Thus, I'll have 3 levels - employee, OU, Company. In R,
I would do something like this:
randomInterceptCount - lme(Performance ~ CountViewPortal, data=analysis,
random=~1|OU/Company1, method=ML)

But, then the point is that CountViewPortal, CountLogin and TimeOnTask are
non-normally distributed. I guess that my question is, what should I do in
case of non-normal distribution?

I really appreciate your help. Thanks again!
Srecko


On Mon, Sep 30, 2013 at 5:14 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Sep 30, 2013, at 3:22 PM, srecko joksimovic wrote:

  I thought so, but then I found this:
  Normality
  The assumption of normality states that the error terms at every level
 of the model are normally distributed
  maybe I misinterpreted something.

 Notice that it is the _error_terms_ that are to be normally distributed,
 not the data itself. One might even infer that normally distrited data
 might be suspect because the correct distribution should be a mixture of
 normals. Since the errors never are going to fit on a straight line on a QQ
 plot, the real question is how far from Normal and what the impact might
 be on the quantities being estimated.

 --
 David.
 
 
  On Mon, Sep 30, 2013 at 3:06 PM, David Winsemius dwinsem...@comcast.net
 wrote:
 
  On Sep 30, 2013, at 2:50 PM, srecko joksimovic wrote:
 
   I have an example of multilevel analysis with 3 levels, but data are
   non-normally distributed. In case of normal distribution, I would
 perform
   multilevel linear analysis using lme function, but what should I do in
 case
   of non-normal distribution?
  
 
  But normal distribution is not a requirement for linear models. Please
 review your theory.
 
   thanks,
   Srecko
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
 
  David Winsemius
  Alameda, CA, USA
 
 

 David Winsemius
 Alameda, CA, USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unrecognized token

2013-09-17 Thread srecko joksimovic
Thanks William,

actually, the combination that works is:
with(list(id=c(1234,abcd)), paste(paste(select * from tbl_user where
student_id = '78789D', sep=),  order by date_time, sep=), maybe I
should try to replace double quotes with single (opposite of what I was
doing...)


On Tue, Sep 17, 2013 at 9:16 AM, William Dunlap wdun...@tibco.com wrote:

 Look at the query strings your code produces:

  with(list(id=c(1234,abcd)), paste(paste(select * from tbl_user
 where student_id = , id,
   sep=),  order by date_time, sep=)
   )
 [1] select * from tbl_user where student_id = 1234 order by date_time
 [2] select * from tbl_user where student_id = abcd order by date_time

 I suspect that the abcd should have quotes around it.  If student_id is
 stored
 as string data the 1234 should probably also have quotes around it.
  Replace
id
 with
\, id, \
 and you may get a query that works.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf
  Of srecko joksimovic
  Sent: Tuesday, September 17, 2013 9:04 AM
  To: R help
  Subject: [R] Unrecognized token
 
  Hi,
 
  when I generate query using sqldf library, like this:
  query = paste(paste(select * from tbl_user where student_id = , id,
  sep=),  order by date_time, sep=)
 
  student - sqldf(query)
 
  everything works fine in case the id is 21328, 82882, or something
 like
  that. But, when id is something like 78789D, there is an error:
  Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: unrecognized token: 78789D)
 
  I tried replacing single quotes with double, but it still doesn't work...
 
  thanks,
  Srecko
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unrecognized token

2013-09-17 Thread srecko joksimovic
Hi,

when I generate query using sqldf library, like this:
query = paste(paste(select * from tbl_user where student_id = , id,
sep=),  order by date_time, sep=)

student - sqldf(query)

everything works fine in case the id is 21328, 82882, or something like
that. But, when id is something like 78789D, there is an error:
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: unrecognized token: 78789D)

I tried replacing single quotes with double, but it still doesn't work...

thanks,
Srecko

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unrecognized token

2013-09-17 Thread srecko joksimovic
Yes, you are right... the other is definitely not a valid query.

thanks


On Tue, Sep 17, 2013 at 11:22 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.uswrote:


  id - c(21328,78789D)

 query - paste(paste(select * from tbl_user where student_id = ,

 id,sep=),  order by date_time, sep=)

 query

 [1] select * from tbl_user where student_id = 21328 order by date_time
 [2] select * from tbl_user where student_id = 78789D order by date_time

 Now, does the second string look like valid SQL to you? In particular, the
 78789D is a problem. On the other hand...


  query - paste(paste(select * from tbl_user where student_id = ',

 id,sep=), ' order by date_time, sep=)

 query

 [1] select * from tbl_user where student_id = '21328' order by date_time
 [2] select * from tbl_user where student_id = '78789D' order by date_time

 As others have pointed out, in this case escaping does not appear to be
 key to getting valid SQL syntax... but looking at the query before shipping
 it off to a database engine seems to me to be an obvious technique you
 should learn.


 On Tue, 17 Sep 2013, srecko joksimovic wrote:

  There is no difference, the same query structure is in the both
 cases:6683
 character
 character
 select * from students where student_id = 6683 order by date_time
 4738D
 character
 character
 select * from students where student_id = 4738D order by date_time

 and still is the same error


 On Tue, Sep 17, 2013 at 9:47 AM, srecko joksimovic
 sreckojoksimo...@gmail.com wrote:
   thanks, Jeff,
 good point... I'll try that


 On Tue, Sep 17, 2013 at 9:43 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
   Why don't you print the 'query' variable with each id
   value and consider what the SQL syntax is for number and
   string literals. Then study the use of escaping in strings
   (\\) to fix the query.
 --**--**
 ---

   Jeff NewmillerThe .
   .  Go Live...
   DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.
   ##.#.  Live Go...
 Live:   OO#.. Dead:
   OO#..  Playing
   Research Engineer (Solar/BatteriesO.O#.
   #.O#.  with
   /Software/Embedded Controllers)   .OO#.
   .OO#.  rocks...1k
 --**--**
 ---

   Sent from my phone. Please excuse my brevity.

   srecko joksimovic sreckojoksimo...@gmail.com wrote:
   Hi,
   
   when I generate query using sqldf library, like this:
   query = paste(paste(select * from tbl_user where
   student_id = , id,
   sep=),  order by date_time, sep=)
   
   student - sqldf(query)
   
   everything works fine in case the id is 21328, 82882,
   or something
   like
   that. But, when id is something like 78789D, there is
   an error:
   Error in sqliteExecStatement(con, statement, bind.data) :
 RS-DBI driver: (error in statement: unrecognized token:
   78789D)
   
   I tried replacing single quotes with double, but it still
   doesn't
   work...
   
   thanks,
   Srecko
   
[[alternative HTML version deleted]]
 
 _**_
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/**posting-guide.htmlhttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
 code.






 --**--**
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 --**--**
 ---


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unrecognized token

2013-09-17 Thread srecko joksimovic
thanks, Jeff,

good point... I'll try that


On Tue, Sep 17, 2013 at 9:43 AM, Jeff Newmiller jdnew...@dcn.davis.ca.uswrote:

 Why don't you print the 'query' variable with each id value and consider
 what the SQL syntax is for number and string literals. Then study the use
 of escaping in strings (\\) to fix the query.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 srecko joksimovic sreckojoksimo...@gmail.com wrote:
 Hi,
 
 when I generate query using sqldf library, like this:
 query = paste(paste(select * from tbl_user where student_id = , id,
 sep=),  order by date_time, sep=)
 
 student - sqldf(query)
 
 everything works fine in case the id is 21328, 82882, or something
 like
 that. But, when id is something like 78789D, there is an error:
 Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: unrecognized token: 78789D)
 
 I tried replacing single quotes with double, but it still doesn't
 work...
 
 thanks,
 Srecko
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unrecognized token

2013-09-17 Thread srecko joksimovic
There is no difference, the same query structure is in the both cases:
6683
character
character
select * from students where student_id = 6683 order by date_time
4738D
character
character
select * from students where student_id = 4738D order by date_time

and still is the same error


On Tue, Sep 17, 2013 at 9:47 AM, srecko joksimovic 
sreckojoksimo...@gmail.com wrote:

 thanks, Jeff,

 good point... I'll try that


 On Tue, Sep 17, 2013 at 9:43 AM, Jeff Newmiller 
 jdnew...@dcn.davis.ca.uswrote:

 Why don't you print the 'query' variable with each id value and consider
 what the SQL syntax is for number and string literals. Then study the use
 of escaping in strings (\\) to fix the query.

 ---
 Jeff NewmillerThe .   .  Go
 Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k

 ---
 Sent from my phone. Please excuse my brevity.

 srecko joksimovic sreckojoksimo...@gmail.com wrote:
 Hi,
 
 when I generate query using sqldf library, like this:
 query = paste(paste(select * from tbl_user where student_id = , id,
 sep=),  order by date_time, sep=)
 
 student - sqldf(query)
 
 everything works fine in case the id is 21328, 82882, or something
 like
 that. But, when id is something like 78789D, there is an error:
 Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: unrecognized token: 78789D)
 
 I tried replacing single quotes with double, but it still doesn't
 work...
 
 thanks,
 Srecko
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] split on change occurence

2013-09-16 Thread srecko joksimovic
Hi,

I had an example like this:
iduseraction
1  12  login
2  12  view
3  12  view
4  12  view
5  12  login
6  12  view
7  12  view
8  12  login
which I used to split using split(dat1,cumsum(dat1$action==login)).

If I had a similar example:
iduserIP
1  12  ip1
2  12  ip1
3  12  ip2
4  12  ip2
5  12  ip2
6  12  ip3
7  12  ip3
8  12  ip3

how can I split data frame to obtain the following structure:
#1
1  12  ip1
2  12  ip1
#2
3  12  ip2
4  12  ip2
5  12  ip2
#3
6  12  ip3
7  12  ip3
8  12  ip3

thanks,
Srecko

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split on change occurence

2013-09-16 Thread srecko joksimovic
Thanks... I don't know why I didn't try... guess was in hurry...
I apologize for posting such a simple question


On Mon, Sep 16, 2013 at 3:44 PM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,

 That's an even simpler case for ?split.


 dat - read.table(text = 

 iduserIP
 1  12  ip1
 2  12  ip1
 3  12  ip2
 4  12  ip2
 5  12  ip2
 6  12  ip3
 7  12  ip3
 8  12  ip3
 , header = TRUE)

 split(dat, dat$IP)


 Hope this helps,

 Rui Barradas

 Em 16-09-2013 22:57, srecko joksimovic escreveu:

 Hi,

 I had an example like this:
 iduseraction
 1  12  login
 2  12  view
 3  12  view
 4  12  view
 5  12  login
 6  12  view
 7  12  view
 8  12  login
 which I used to split using split(dat1,cumsum(dat1$action=**=login)).

 If I had a similar example:
 iduserIP
 1  12  ip1
 2  12  ip1
 3  12  ip2
 4  12  ip2
 5  12  ip2
 6  12  ip3
 7  12  ip3
 8  12  ip3

 how can I split data frame to obtain the following structure:
 #1
 1  12  ip1
 2  12  ip1
 #2
 3  12  ip2
 4  12  ip2
 5  12  ip2
 #3
 6  12  ip3
 7  12  ip3
 8  12  ip3

 thanks,
 Srecko

 [[alternative HTML version deleted]]

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Add new calculated column to data frame

2013-08-29 Thread srecko joksimovic
Hi,

I have a following data set:
ideventtime (in sec)
1 add  1373502892
2 add  1373502972
3 delete   1373502995
4 view  1373503896
5 add   1373503996
...

I'd like to add new column time on task which is time elapsed between two
events (id2 - id1...). What would be the best approach to do that?

Thanks,
Srecko

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add new calculated column to data frame

2013-08-29 Thread srecko joksimovic
Thanks Arun,

this is great. However, it should be just a little bit different:

#  id  event   time time_on_task
#1  1add 1373502892   80
#2  2add 1373502972   23
#3  3 delete 1373502995   901
#4  4   view 1373503896  100
#5  5add 1373503996  NA

When I calculate difference, I need to know how long each activity was. It
is id2-id1 for the first activity...


On Thu, Aug 29, 2013 at 11:03 AM, arun smartpink...@yahoo.com wrote:



 Hi,
 Try:
 dat1- read.table(text=
 ideventtime
 1add  1373502892
 2add  1373502972
 3delete  1373502995
 4view  1373503896
 5add  1373503996
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
  dat1$time_on_task- c(NA,diff(dat1$time))
  dat1
 #  id  event   time time_on_task
 #1  1add 1373502892   NA
 #2  2add 1373502972   80
 #3  3 delete 1373502995   23
 #4  4   view 1373503896  901
 #5  5add 1373503996  100

 #Not sure whether this depends on the values of event or not..
 A.K.





 - Original Message -
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: R help R-help@r-project.org
 Cc:
 Sent: Thursday, August 29, 2013 1:52 PM
 Subject: [R] Add new calculated column to data frame

 Hi,

 I have a following data set:
 ideventtime (in sec)
 1 add  1373502892
 2 add  1373502972
 3 delete   1373502995
 4 view  1373503896
 5 add   1373503996
 ...

 I'd like to add new column time on task which is time elapsed between two
 events (id2 - id1...). What would be the best approach to do that?

 Thanks,
 Srecko

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add new calculated column to data frame

2013-08-29 Thread srecko joksimovic
Hi Arun,

There is one more question... you explained me how to
use split(dat1,cumsum(dat1$action==login)) in one of previous questions,
and that is great.
Now, if I have something like this:

id  moduleevent   time   time_on_task
1   sys login 1373502892   80
2   taskadd  1373502892   80
3   taskadd  1373502972   23
4   sys login 1373502892   80
5   list delete   1373502995  901
6   list  view 1373503896  100
7   taskadd  1373503996   NA

I know how to split at each login occurrence, and I know how to add new
column with time differences. But, how to add new column category which
will be calculated based on columns module and even? For example if
module=task and event=add = category= A...

Srecko



On Thu, Aug 29, 2013 at 11:22 AM, arun smartpink...@yahoo.com wrote:

 Hi Srecko,
 No problem.
 Regards,
 Arun






 
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: arun smartpink...@yahoo.com
 Sent: Thursday, August 29, 2013 2:22 PM
 Subject: Re: [R] Add new calculated column to data frame



 Sorry... I should figure it out...

 thanks so much!
 Srecko



 On Thu, Aug 29, 2013 at 11:21 AM, arun smartpink...@yahoo.com wrote:

 Hi,
 The one you showed is:
 
 dat1$time_on_task- c(diff(dat1$time),NA)
 
  dat1
 #  id  event   time time_on_task
 #1  1add 1373502892   80
 
 #2  2add 1373502972   23
 #3  3 delete 1373502995  901
 #4  4   view 1373503896  100
 #5  5add 1373503996   NA
 
 
 
 
 
 From: srecko joksimovic sreckojoksimo...@gmail.com
 
 To: arun smartpink...@yahoo.com
 Cc: R help r-help@r-project.org
 Sent: Thursday, August 29, 2013 2:15 PM
 Subject: Re: [R] Add new calculated column to data frame
 
 
 
 
 Thanks Arun,
 
 this is great. However, it should be just a little bit different:
 
 #  id  event   time time_on_task
 #1  1add 1373502892   80
 #2  2add 1373502972   23
 #3  3 delete 1373502995   901
 #4  4   view 1373503896  100
 #5  5add 1373503996  NA
 
 
 When I calculate difference, I need to know how long each activity was.
 It is id2-id1 for the first activity...
 
 
 
 On Thu, Aug 29, 2013 at 11:03 AM, arun smartpink...@yahoo.com wrote:
 
 
 
 Hi,
 Try:
 dat1- read.table(text=
 ideventtime
 
 1add  1373502892
 2add  1373502972
 3delete  1373502995
 4view  1373503896
 5add  1373503996
 ,sep=,header=TRUE,stringsAsFactors=FALSE)
  dat1$time_on_task- c(NA,diff(dat1$time))
  dat1
 #  id  event   time time_on_task
 #1  1add 1373502892   NA
 #2  2add 1373502972   80
 #3  3 delete 1373502995   23
 #4  4   view 1373503896  901
 #5  5add 1373503996  100
 
 #Not sure whether this depends on the values of event or not..
 A.K.
 
 
 
 
 
 
 - Original Message -
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: R help R-help@r-project.org
 Cc:
 Sent: Thursday, August 29, 2013 1:52 PM
 Subject: [R] Add new calculated column to data frame
 
 Hi,
 
 I have a following data set:
 ideventtime (in sec)
 1 add  1373502892
 2 add  1373502972
 3 delete   1373502995
 4 view  1373503896
 5 add   1373503996
 ...
 
 I'd like to add new column time on task which is time elapsed between
 two
 events (id2 - id1...). What would be the best approach to do that?
 
 Thanks,
 Srecko
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add new calculated column to data frame

2013-08-29 Thread srecko joksimovic
Thanks Berend,

I don't know why I didn't try that before posting the question... but...
anyways, thanks for your help

Srecko


On Thu, Aug 29, 2013 at 11:34 AM, Berend Hasselman b...@xs4all.nl wrote:


 On 29-08-2013, at 20:15, srecko joksimovic sreckojoksimo...@gmail.com
 wrote:

  Thanks Arun,
 
  this is great. However, it should be just a little bit different:
 
  #  id  event   time time_on_task
  #1  1add 1373502892   80
  #2  2add 1373502972   23
  #3  3 delete 1373502995   901
  #4  4   view 1373503896  100
  #5  5add 1373503996  NA
 
  When I calculate difference, I need to know how long each activity was.
 It
  is id2-id1 for the first activity...

 then why don't you try

 dat1$time_on_task- c(diff(dat1$time),NA)

 Berend



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add new calculated column to data frame

2013-08-29 Thread srecko joksimovic
Hi Arun,

this could to the work...

Thanks so much!


On Thu, Aug 29, 2013 at 3:10 PM, arun smartpink...@yahoo.com wrote:

 HI,
 It's not really clear, but you can try this:
 dat1- read.table(text=
 id module  event   time time_on_task Categurl
   1sys  login 1373502892   80 B
 http://post/add?id=42idp=45
  2   taskadd 1373502892   80 A
 http://post/add?id=33idp=45
  3   taskadd 1373502972   23 A
 http://post/add?id=34idp=45
  4sys  login 1373502892   80 B
 http://post/add?id=39idp=42
  5   list delete 1373502995  901 C
 http://post/add?id=37idp=41
  6   list   view 1373503896  100 D
 http://post/add?id=36idp=46
  7   taskadd 1373503996   NA A
 http://post/add?id=31idp=45
 ,sep=,header=TRUE,stringsAsFactors=FALSE)

 vec1-as.numeric(gsub(.*\\?.*=(\\d+)\\.*,\\1,dat1$url[dat1$Categ==A]))
  vec1
 #[1] 33 34 31

 dat2- read.table(text=
 id idpost idtopic iduser
 1   45  33   101
 2   46  34   102
 3   47  33   103
 4   48  33   101
 5   49  35   104
 ,sep=,header=TRUE)
  dat1$Categ[dat1$Categ==A][!vec1%in%dat2$idtopic]-F
  dat1
 #  id module  event   time time_on_task Categ
 url
 #1  1sys  login 1373502892   80 B
 http://post/add?id=42idp=45
 #2  2   taskadd 1373502892   80 A
 http://post/add?id=33idp=45
 #3  3   taskadd 1373502972   23 A
 http://post/add?id=34idp=45
 #4  4sys  login 1373502892   80 B
 http://post/add?id=39idp=42
 #5  5   list delete 1373502995  901 C
 http://post/add?id=37idp=41
 #6  6   list   view 1373503896  100 D
 http://post/add?id=36idp=46
 #7  7   taskadd 1373503996   NA F
 http://post/add?id=31idp=45


 A.K.






 
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: arun smartpink...@yahoo.com
 Sent: Thursday, August 29, 2013 5:38 PM
 Subject: Re: [R] Add new calculated column to data frame



 Hi Arun,

 I really appreciate your help, and we did a great job :)
 but, now I think that R can do anything, so I'd like to try one more
 thing, if you don't mind...

 from the table with categories,

 #  id module  event   time time_on_task Categurl
 #1  1sys  login 1373502892   80 B http:
 #2  2   taskadd 1373502892   80 A http:
 #3  3   taskadd 1373502972   23 A http:
 #4  4sys  login 1373502892   80 B  http:
 #5  5   list delete 1373502995  901 C
 #6  6   list   view 1373503896  100 D
 #7  7   taskadd 1373503996   NA A


 I'd like to use only certain category (for example A). Each of these
 fields has an url whose format is something like
 http://post/add?id=33idp=45. First step would be to extract this id (33
 in this case). Based on that value, I want to find all iduser from the
 following table:

 id idpost idtopic iduser
 1   45  33   101
 2   46  34   102

 3   47  33   103

 4   48  33   101

 5   49  35   104


 The next step would be to check if at least one of these values (iduser)
 is not in the vectors users (only ids). If that is the case, I want to
 change category to F, if not, I want to keep the same category.

 If this is too much for one question, I'll implement this in Java, but I'd
 really like to try this with R. Maybe this id extraction from url is the
 most important problem... I tried most of these steps, but still not able
 to put them all together...

 Thank you so much for your time.
 Srecko








 On Thu, Aug 29, 2013 at 12:22 PM, arun smartpink...@yahoo.com wrote:

 Hi Srecko,
 No problem.
 
 Arun
 
 
 
 
 
 
 
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: arun smartpink...@yahoo.com
 Sent: Thursday, August 29, 2013 3:19 PM
 
 Subject: Re: [R] Add new calculated column to data frame
 
 
 
 This is great Arun, thank you again.
 
 I was thinking to use sqldf and issue query for each module-action
 combination, but this is much better. Since I have table with categories
 (module, action, category), I could create vector levels based on the
 first two columns and vector labels based on the category column and that
 should to the work...
 
 Best,
 Srecko
 
 
 
 On Thu, Aug 29, 2013 at 12:16 PM, arun smartpink...@yahoo.com wrote:
 
 Hi Srecko,
 
 You didn't mention the order in which the letters are assigned.  If you
 need a different order, just change the order in the ,levels=c(),.
 Arun
 
 
 
 
 - Original Message -
 From: arun smartpink...@yahoo.com
 To: srecko joksimovic sreckojoksimo...@gmail.com
 Cc: R help r-help@r-project.org
 
 Sent: Thursday, August 29, 2013 3:13 PM
 Subject: Re: [R] Add new calculated column to data frame
 
 
 
 Hi,
 You could try this:
 dat1- read.table(text=
 id  moduleevent   time

Re: [R] Add new calculated column to data frame

2013-08-29 Thread srecko joksimovic
Thanks, I'll try this as well.

Srecko


On Thu, Aug 29, 2013 at 3:26 PM, arun smartpink...@yahoo.com wrote:



 Hi Srecko,
 Try this:
 dat1- read.table(text=
 id module  event   time time_on_task Categurl
 1sys  login 1373502892   80 B http://
 2   taskadd 1373502892   80 A
 http://post/add?id=33idp=67
 3   taskadd 1373502972   23 A
 http://post/add?id=34idp=67
 4sys  login 1373502892   80 B  http://
 5   list delete 1373502995  901 C  http://
 6   list   view 1373503896  100 D   http://
 7   taskadd 1373503996   NA A
 http://post/add?id=35idp=99
 ,sep=,header=TRUE,stringsAsFactors=FALSE)

 vec1-as.numeric(gsub(.*\\?.*=(\\d+)\\.*,\\1,dat1$url[dat1$Categ==A]))

 dat2- read.table(text=
 id idpost idtopic iduser
 1   45  33   101
 2   46  34   102
 3   47  33   103
 4   48  33   101
 5   49  35   104
 ,sep=,header=TRUE)
  student_list- c(101:102,104:107)
  vec2-with(dat2,tapply(iduser,list(idtopic),FUN=function(x) all(x%in%
 student_list)))

 dat1$Categ[dat1$Categ==A][match(vec1,as.numeric(names(vec2)))[!vec2]]-F
  dat1
 #  id module  event   time time_on_task Categ
 url
 #1  1sys  login 1373502892   80 B
 http://
 #2  2   taskadd 1373502892   80 F
 http://post/add?id=33idp=67
 #3  3   taskadd 1373502972   23 A
 http://post/add?id=34idp=67
 #4  4sys  login 1373502892   80 B
 http://
 #5  5   list delete 1373502995  901 C
 http://
 #6  6   list   view 1373503896  100 D
 http://
 #7  7   taskadd 1373503996   NA A
 http://post/add?id=35idp=99

 A.K.

 
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: arun smartpink...@yahoo.com
 Sent: Thursday, August 29, 2013 6:04 PM
 Subject: Re: [R] Add new calculated column to data frame



 Did you mean to separate the number 33 from the link? , yes that is
 correct. It should be something like this:


 #  id module  event   time time_on_task Categurl
 #1  1sys  login 1373502892   80 B http://
 #2  2   taskadd 1373502892   80 A
 http://post/add?id=33idp=67
 #3  3   taskadd 1373502972   23 A
 http://post/add?id=34idp=67
 #4  4sys  login 1373502892   80 B  http://

 #5  5   list delete 1373502995  901 C  http://
 #6  6   list   view 1373503896  100 D   http://
 #7  7   taskadd 1373503996   NA A
 http://post/add?id=35idp=99

 from this table I should get 3 rows with 3 URLs:
 http://post/add?id=33idp=67, http://post/add?id=34idp=67, and
 http://post/add?id=35idp=99
 For each of them, I need to extract id (33, 34, and 35). Once I do that, I
 need to obtain users from this table:
 id idpost idtopic iduser
 1   45  33   101
 2   46  34   102

 3   47  33   103

 4   48  33   101

 5   49  35   104

 again, for each id. This means:
 id = 33 = 101, 103
 id = 34 = 102

 id = 35 = 104


 Next, for each vector I need to check whether or not all it's values are
 in the students list (101,102, 104,105, 106,107)

 id = 33 = FALSE (since 103 is not in the list)
 id = 34 = TRUE

 id = 35 = TRUE


 This means that category for row 2 in the first table is not A any more,
 but F...

 Thanks,
 Srecko





 On Thu, Aug 29, 2013 at 2:56 PM, arun smartpink...@yahoo.com wrote:

 HI Srecko,
 Did you mean to separate the number 33 from the link? Could you provide a
 reproducible example with the output you expected?
 Tx.
 
 
 Arun
 
 
 
 
 
 
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: arun smartpink...@yahoo.com
 Sent: Thursday, August 29, 2013 5:38 PM
 
 Subject: Re: [R] Add new calculated column to data frame
 
 
 
 Hi Arun,
 
 I really appreciate your help, and we did a great job :)
 but, now I think that R can do anything, so I'd like to try one more
 thing, if you don't mind...
 
 from the table with categories,
 
 #  id module  event   time time_on_task Categurl
 #1  1sys  login 1373502892   80 B http:
 #2  2   taskadd 1373502892   80 A http:
 #3  3   taskadd 1373502972   23 A http:
 #4  4sys  login 1373502892   80 B  http:
 #5  5   list delete 1373502995  901 C
 #6  6   list   view 1373503896  100 D
 #7  7   taskadd 1373503996   NA A
 
 
 I'd like to use only certain category (for example A). Each of these
 fields has an url whose format is something like
 http://post/add?id=33idp=45. First step would be to extract this id (33
 in this case). Based on that value, I want to find all iduser from the
 following table:
 
 id idpost idtopic iduser
 1   45  33   101
 2   46  34   102
 
 3

[R] Iterate over rows and update values based on condition

2013-08-27 Thread srecko joksimovic
Hi,

I have a data set with structure similar to this:
iduseraction
1  12  login
2  12  view
3  12  view
4  12  view
5  12  login
6  12  view
7  12  view
8  12  login

I want to create a list of sessions. That means to split table on every
occurrence of login. Using Java (or some other language), I would
probably iterate through rows and create new List instance on every
login, but I guess there is more efficient way to do that using R?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Iterate over rows and update values based on condition

2013-08-27 Thread srecko joksimovic
This is great!
Thank you so much.


On Tue, Aug 27, 2013 at 3:06 PM, arun smartpink...@yahoo.com wrote:

 Hi,
 May be this helps:
 dat1- read.table(text=
 iduseraction
 1  12  login
 2  12  view
 3  12  view
 4  12  view
 5  12  login
 6  12  view
 7  12  view
 8  12  login
 ,sep=,header=TRUE,stringsAsFactors=FALSE)


 split(dat1,cumsum(dat1$action==login))
 #$`1`
  # id user action
 #1  1   12  login
 #2  2   12   view
 #3  3   12   view
 #4  4   12   view
 #
 #$`2`
  # id user action
 #5  5   12  login
 #6  6   12   view
 #7  7   12   view
 #
 #$`3`
  # id user action
 #8  8   12  login
 A.K.



 - Original Message -
 From: srecko joksimovic sreckojoksimo...@gmail.com
 To: R-help@r-project.org
 Cc:
 Sent: Tuesday, August 27, 2013 3:29 PM
 Subject: [R] Iterate over rows and update values based on condition

 Hi,

 I have a data set with structure similar to this:
 iduseraction
 1  12  login
 2  12  view
 3  12  view
 4  12  view
 5  12  login
 6  12  view
 7  12  view
 8  12  login

 I want to create a list of sessions. That means to split table on every
 occurrence of login. Using Java (or some other language), I would
 probably iterate through rows and create new List instance on every
 login, but I guess there is more efficient way to do that using R?

 Thanks

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.