date:20120516

Re: [R] Error on easy way for JoSAE Package

2012-05-16 Thread ana24maria

Thank you very much. 
After using dput and the easy way ( result - eblup.mse.f.wrap(domain.data
= amigo, lme.obj = fit.lme)),
i have got the following error:

Error in `[.data.frame`(sample.data, , variabs) : 
  undefined columns selected


What should I do?




--
View this message in context: 
http://r.789695.n4.nabble.com/Error-on-easy-way-for-JoSAE-Package-tp4625684p4630220.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] code to iterate function apply to matrix

2012-05-16 Thread umai88

I got this code below and i want to repeat the loop for 100 times..

x-rnorm(60)
mat1-matrix(x,nrow=15,ncol=4)

trim-numeric(ncol(mat1))
win-numeric(ncol(mat1))
ssd-numeric(ncol(mat1))

for(j in 1:ncol(mat1))
{
n=length(mat1[,j])
alpha=0.1
k=floor(alpha*n)+1
r=k-(alpha*n)
i=k+1
m=n-k
y1-sort(mat1[,j])
y-y1[i:m]
x.low=(1-r)*y1[k+1]+r*y1[k]
x.upp=(1-r)*y1[n-k]+r*y1[n-k+1]
trim[j] =1/((1-2*alpha)*n)*(sum(y)+r*(y1[k]+y1[n-k+1]))
win[j]=1/n*(sum(y)+k*(x.low+x.upp))
ssd[j]-sum((y-win[j])**2)+k*( (y1[k+1]-win[j])**2 + (y1[n-k]-win[j])**2 ) 
}

trim.mean-matrix(trim, nrow=1)
win.mean-matrix(win, nrow=1)
sum.sq.dev-matrix(ssd, nrow=1)

--
View this message in context: 
http://r.789695.n4.nabble.com/code-to-iterate-function-apply-to-matrix-tp4630221.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regression Analysis or Anova?

2012-05-16 Thread Robert Latest

Hello Andrea,

I don't know if I can help you (probably not, I'm a beginner myself),
but you that you should make it a lot easier for those that can if you
post a self-contained script in this forum that shows what you're
trying to do. Use dput() to dump your dataset in text form.

Good luck,
robert


On Tue, May 15, 2012 at 10:49 PM, Andrea Sica aerdna.s...@gmail.com wrote:
 Dear all,

 I hope to be the clearest I can.
 Let's say I have a dataset with 10 variables, where 4 of them represent for
 me a certain phenomenon that I call Y.
 The other 6 represent for me another phenomenon that I call X.

 Each one of those variables (10) contains 37 units. Those units are just
 the respondents of my analysis (a survey).
 Since all the questions are based on a Likert scale, they are qualitative
 variables. The scale is from 0 to 7 for all of
 them, but there are -1 and -2 values where the answer is missing. Hence
 the scale goes actually from -2 to 7.

 What I want to do is to calculate the regression between my Y (which
 contains 4 variables in this case and 37 answers
 for each variable) and my X (which contains 6 variables instead and the
 same number of respondents). I know that for
 qualitative analyses I should use Anova instead of the regression, although
 I have read somewhere that it is even possible
 to make the regression.

 Until now I have tried to act this way:
 __
 apply(Y, 1, function(Y) mean(Y[Y0])) #calculate the average per rows
 (respondents) without considering the negative values

 Y.reg- c(apply(Y, 1, function(Y) mean(Y[Y0]))) #create the vector Y,
 thus it results like 1 variable with 37 numbers

 apply(X, 1, function(X) mean(X[X0]))

 X.reg- c(apply(X, 1, function(X) mean(X[X0]))) #create the vector
 X, thus it results like 1 variable with 37 numbers

 reg1- lm(Y.reg~ X.reg) #make the first regression
 summary(reg1) #see the results

 Call:
 lm(formula = Y.reg ~ X.reg)

 Residuals:
     Min         1Q       Median      3Q       Max
 -2.26183 -0.49434 -0.02658  0.37260  2.08899

 Coefficients:
                 Estimate Std. Error   t value   Pr(|t|)
 (Intercept)   4.2577     0.4986      8.539    4.46e-10 ***
 X.reg          0.1008     0.1282      0.786    0.437
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.7827 on 35 degrees of freedom
 Multiple R-squared: 0.01736,    Adjusted R-squared: -0.01072
 F-statistic: 0.6182 on 1 and 35 DF,  p-value: 0.437

 layout(matrix(1:4,2,2)) #graphical approach
 plot(reg1)

 please see the pfd() function attached.
 

 But as you can see, although I do not use Y as composed by 4 variables and
 X by 6, and I do not consider the negative values
 too, I get a very low score as my R^2.

 If I act with anova instead I have this problem:
 
 Ymatrix- as.matrix(Y)
 Xmatrix- as.matrix(X) #where both this Y and X are in their first form,
 thus composed by more variables (4 and 6) and with
 #negative values as well.

 Errore in UseMethod(anova) :
  no applicable method for 'anova' applied to an object of class
 c('matrix', 'integer', 'numeric')
 

 To be honest, a few days ago I succeeded in using anova, but unfortunately
 I do not remember how and I did not save the
 command anywhere.

 What I would like to know is:

 - First of all, am I wrong in how I approach to my problem?
 - What do you think about the regression output?
 - Finally, how can I do to make the anova? If I have to do it.

 I really hope I have been clear. Thank you all for any kind of help.

 Best,

 Andrea

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] order a data frame by date with order

2012-05-16 Thread Benedikt Gehr

Hi

I have a rather large data frame (7000 rows with 28 columns) which I want
to sort by date. Below I have a example of the data frame. The Date column
is called DT, is a factor and looks like this:

class(res.merge$DT)
[1] factor
head(res.merge$DT)
[1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012
15:01:15
[5] 17.3.2012 15:32:14 17.3.2012 16:01:29
2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50

res.merge is the data frame unordered. Now I want to order the data frame
with:

res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y
%H:%M:%S)),]

This works in fact, however for some reason there are always two entires
that go at the end of the data frame for no obvious reason (see below,
09.05.2012 ist the most recent date). And this is the case for different
data.frames. The two entries at the end are always 25.3.2012 02:00:xx and
25.3.2012 02.30.xx.

Can anybody tell me what the problem is? Any help is most appreciated.

Best Benedikt

res.ordered[2545:2549,]
 DT Typ  NOD Day_s DOW_s   Time_s Long  Lat
2547  9.5.2012 14:30:56 GPS 1893  9.5.2012We 14:30:00 7.452218 46.43579
2548  9.5.2012 15:02:09 GPS 1893  9.5.2012We 15:00:35 7.451983 46.43583
2549  9.5.2012 15:30:50 GPS 1893  9.5.2012We 15:30:00 7.451973 46.43597
1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414
1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437
 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e   Time_e   BV Temp 
SOG
2547 1182.8   3  A  1   143  55  9.5.2012We 14:30:56 3735   31
0.09
2548 1182.8   3  A  1   143  94  9.5.2012We 15:02:09 3637   32
0.02
2549 1176.5   3  A  1   143  50  9.5.2012We 15:30:50 3730   29
0.17
1845 1295.2   3  A  1   151  17 25.3.2012So 02:00:18 37157
0.18
1846 1287.3   3  A  1   144  16 25.3.2012So 02:30:16 37208
0.14
 Heading  SAE  HAE BW_2 BW_3  X..
2547   24.90 3.81 9.47 3666 3625 9.08
25487.86 0.51 7.17 3593 3586 9.11
2549  344.72 2.86 4.10 3662 3623 9.12
1845  335.54 3.53 5.63 3618 3618 0.81
1846   75.37 5.44 8.96 3618 3618 0.81

--
View this message in context: 
http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help needed for efficient way to loop through rows and columns

2012-05-16 Thread Priya Bhatt

Dear R-helpers:

I am trying to write a script that iterates through a dataframe that looks
like this:


Example dataset called sample:

names - c(S1, S2, S3, S4)
X - c(BB, AB, AB, AA)
Y - c(BB, BB, AB, AA)
Z - c(BB, BB, AB, NA)
AorB - c(A, A, A, B)

sample - data.frame(names, X, Y, Z, AorB)


for a given row,

if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0

if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2

I've been trying  to write this using apply and ifelse statements in hopes
that my code runs quickly, but I'm afraid I've make a big mess.  See below:

apply(sample, 1, function(i) {


  ifelse(sample$AorB[i] == A,
 (ifelse(sample[i,] == AA, sample[i,] - 2 ,
 ifelse(sample[i,] == AB || sample[i,] == BA ,
sample[i,] - 1,
ifelse(sample[i,] == BB, sample[i,] - 0,
sample[i,] - NA )) )
  )   , ifelse(sample$AorB[i,] == B),
 (ifelse(sample[i,] == AA, sample[i,] - 0 ,
 ifelse(sample[i,] == AB || sample[i,] == BA ,
sample[i,] - 1,
ifelse(sample[i,] == BB, sample[i,] - 2,
sample[i,] - NA) })


Any Advice?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Wrong Q3 + Mean.

2012-05-16 Thread Retep32

Hi.

 a
 [1] 13 13 14 14 15 15 16 20 21 26
 summary(a)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
   13.014.015.016.719.026.0 
 mean(a)
[1] 16.7
 quantile(a)
  0%  25%  50%  75% 100% 
  13   14   15   19   26 

Clearly, this is not right. My Instructor and I have no idea why the program
does that. I removed the program from the computer , installed it again and
it still shows the mistake.  It is also strange, that I chose english as
installlanguage, but the program is in german (my OS is in german).

Pls help, because otherwise i cannot solve any problems with R.

Using Win7 and R version 2.15.0 (2012-03-30).

Retep

--
View this message in context: 
http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wrong Q3 + Mean.

2012-05-16 Thread Joshua Wiley

On Wed, May 16, 2012 at 12:22 AM, Retep32 retepdel...@web.de wrote:
 Hi.

 a
  [1] 13 13 14 14 15 15 16 20 21 26
 summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   13.0    14.0    15.0    16.7    19.0    26.0
 mean(a)
 [1] 16.7
 quantile(a)
  0%  25%  50%  75% 100%
  13   14   15   19   26

 Clearly, this is not right. My Instructor and I have no idea why the program

Really?  It is not at all clear to me what makes this not right.
Have you tried looking at the documentation for quantile? (which you
can access by typing ?quantile or help(quantile) )  There are
multiple algorithms to calculate quantiles which in practice often
yield quite similar results, but, particularly for very small datasets
such as are common for class exercises, and a few other cases do
behave rather differently.  You can caompare the 9 varieties by
running this:

sapply(1:9, function(i) quantile(a, type = i))

which for me yields:

 [,1] [,2] [,3] [,4] [,5]  [,6] [,7] [,8][,9]
0% 13   13   13 13.0   13 13.00   13 13.0 13.
25%14   14   13 13.5   14 13.75   14 13.91667 13.9375
50%15   15   15 15.0   15 15.00   15 15.0 15.
75%20   20   20 18.0   20 20.25   19 20.08333 20.0625
100%   26   26   26 26.0   26 26.00   26 26.0 26.

Perhaps one of those is what you are looking for (rows are quantiles,
each column uses a different algorithm, types 1 through 9,
respectively).

Hope this helps,

Josh

 does that. I removed the program from the computer , installed it again and
 it still shows the mistake.  It is also strange, that I chose english as
 installlanguage, but the program is in german (my OS is in german).

 Pls help, because otherwise i cannot solve any problems with R.

 Using Win7 and R version 2.15.0 (2012-03-30).

 Retep

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] order a data frame by date with orderl

2012-05-16 Thread Jim Holtman

Is the a daylight saving time problem?  Check your timezone and see when it 
occurred; these times might not be legal.

Sent from my iPad

On May 16, 2012, at 3:27, Benedikt Gehr benedikt.g...@ieu.uzh.ch wrote:

 Hi
 
 I have a rather large data frame (7000 rows with 28 columns) which I want
 to sort by date. Below I have a example of the data frame. The Date column
 is called DT, is a factor and looks like this:
 
 class(res.merge$DT)
 [1] factor
 head(res.merge$DT)
 [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012
 15:01:15
 [5] 17.3.2012 15:32:14 17.3.2012 16:01:29
 2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50
 
 res.merge is the data frame unordered. Now I want to order the data frame
 with:
 
 res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y
 %H:%M:%S)),]
 
 This works in fact, however for some reason there are always two entires
 that go at the end of the data frame for no obvious reason (see below,
 09.05.2012 ist the most recent date). And this is the case for different
 data.frames. The two entries at the end are always 25.3.2012 02:00:xx and
 25.3.2012 02.30.xx.
 
 Can anybody tell me what the problem is? Any help is most appreciated.
 
 Best Benedikt
 
 res.ordered[2545:2549,]
 DT Typ  NOD Day_s DOW_s   Time_s Long  Lat
 2547  9.5.2012 14:30:56 GPS 1893  9.5.2012We 14:30:00 7.452218 46.43579
 2548  9.5.2012 15:02:09 GPS 1893  9.5.2012We 15:00:35 7.451983 46.43583
 2549  9.5.2012 15:30:50 GPS 1893  9.5.2012We 15:30:00 7.451973 46.43597
 1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 46.45414
 1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 46.45437
 Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e   Time_e   BV Temp 
 SOG
 2547 1182.8   3  A  1   143  55  9.5.2012We 14:30:56 3735   31
 0.09
 2548 1182.8   3  A  1   143  94  9.5.2012We 15:02:09 3637   32
 0.02
 2549 1176.5   3  A  1   143  50  9.5.2012We 15:30:50 3730   29
 0.17
 1845 1295.2   3  A  1   151  17 25.3.2012So 02:00:18 37157
 0.18
 1846 1287.3   3  A  1   144  16 25.3.2012So 02:30:16 37208
 0.14
 Heading  SAE  HAE BW_2 BW_3  X..
 2547   24.90 3.81 9.47 3666 3625 9.08
 25487.86 0.51 7.17 3593 3586 9.11
 2549  344.72 2.86 4.10 3662 3623 9.12
 1845  335.54 3.53 5.63 3618 3618 0.81
 1846   75.37 5.44 8.96 3618 3618 0.81
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Change the order of variables in a linear model

2012-05-16 Thread Frank Paetzold

Hello,

the following lines

m - matrix(c(1,1,9,1,2,6,1,3,7,2,1,4,2,2,5,2,3,1,3,1,2,3,2,-1,3,3,-2), 9,
3, byrow = TRUE, dimnames=list(NULL, cbind('A','B','Y')))
md - as.data.frame(m)
md$A - as.factor(md$A)
md$B - as.factor(md$B)

mm - model.matrix(Y~A+B+A:B, data=md)

produce 

 mm
  (Intercept) A2 A3 B2 B3 A2:B2 A3:B2 A2:B3 A3:B3
1   1  0  0  0  0 0 0 0 0
2   1  0  0  1  0 0 0 0 0
3   1  0  0  0  1 0 0 0 0
4   1  1  0  0  0 0 0 0 0
5   1  1  0  1  0 1 0 0 0
6   1  1  0  0  1 0 0 1 0
7   1  0  1  0  0 0 0 0 0
8   1  0  1  1  0 0 1 0 0
9   1  0  1  0  1 0 0 0 1
attr(,assign)
[1] 0 1 1 2 2 3 3 3 3
attr(,contrasts)
attr(,contrasts)$A
[1] contr.treatment

attr(,contrasts)$B
[1] contr.treatment


However, instead of the order 
  (Intercept) A2 A3 B2 B3 A2:B2 A3:B2 A2:B3 A3:B3
  | |  
  changed order 
i'd like to have | |
  (Intercept) A2 A3 B2 B3 A2:B2 A2:B3 A3:B2 A3:B3

that is, the order of the A:B interaction variables is changed.
Is there a way to freely position variables in a model?

Thank you,

Frank


--
View this message in context: 
http://r.789695.n4.nabble.com/Change-the-order-of-variables-in-a-linear-model-tp4630230.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] correlation among variables in the same subset

2012-05-16 Thread Andrea Sica

Dear all,

I have created a subset from my dataset, which contains 6 variables.
I need to make the correlation among all of them, possibly, without
making it one by one. Is there any command that can permits me to
do it directly for all of them in the same time?

Thank you so much in advance.

Andrea

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wrong Q3 + Mean.

2012-05-16 Thread Michael Dewey


At 08:22 16/05/2012, Retep32 wrote:

Hi.

 a
 [1] 13 13 14 14 15 15 16 20 21 26
 summary(a)
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
   13.014.015.016.719.026.0
 mean(a)
[1] 16.7
 quantile(a)
  0%  25%  50%  75% 100%
  13   14   15   19   26

Clearly, this is not right. My Instructor and I have no idea why the program
does that.


If you have no idea why R does something you could try reading the 
documentation which tells you in some detail (in this case) what R is doing.

?quantile


I removed the program from the computer , installed it again and
it still shows the mistake.  It is also strange, that I chose english as
installlanguage, but the program is in german (my OS is in german).


It used English during installation though, right? So it did what you asked.


Pls help, because otherwise i cannot solve any problems with R.

Using Win7 and R version 2.15.0 (2012-03-30).

Retep

--
View this message in context: 
http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html

Sent from the R help mailing list archive at Nabble.com.


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] tm package: problem of TermDocumentMatrix and minWordLength

2012-05-16 Thread C.H.

Dear All,

The following code illustrate the problem.

[R code]
require(tm)
exampledoc - c(R is good, R is really good)
examplecorpus - Corpus(VectorSource(exampledoc), encoding = UTF-8)
dtm - DocumentTermMatrix(examplecorpus, control = list(minWordLength = 1))
as.matrix(dtm)
[/R code]

The term R and is were not included in the dtm even the control
parameter minWordLength was set to 1.

Terms
Docs good really
   11  0
   21  1

Would you reproduce this problem?

The following is my sessionInfo

 sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] tm_0.5-7.1

loaded via a namespace (and not attached):
[1] compiler_2.15.0 slam_0.1-23 tools_2.15.0

Regards,

CH

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals for nls or nls2 model

2012-05-16 Thread Gabor Grothendieck

On Tue, May 15, 2012 at 11:20 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Tue, May 15, 2012 at 8:08 PM, Francisco Mora Ardila
 fm...@oikos.unam.mx wrote:
 Hi all

 I have fitted a model usinf nls function to these data:

 x
  [1]   1   0   0   4   3   5  12  10  12 100 100 100

 y
  [1]  1.281055090  1.563609934  0.001570796  2.291579783  0.841891853
  [6]  6.553951324 14.243274230 14.519899320 15.066473610 21.728809880
 [11] 18.553054450 23.722637370

 The model fitted is:

 modellogis-nls(y~SSlogis(x,a,b,c))

 It runs OK. Then I calculate confidence intervals for the actual data using:

 dataci-predict(as.lm(modellogis), interval = confidence)

 BUt I don´t get smooth curves when plotting it, so I want to get other 
 confidence
 vectors based on a new x vector by defining a new data to do predictions:

 x0 -  seq(0,15,1)
 dataci-predict(as.lm(modellogis), newdata=data.frame(x=x0), interval = 
 confidence)

 BUt it does not work: I get the same initial confidence interval

 Any ideas on how to get tconfidence and prediction intervals using new X 
 data on a
 previous model?


 as.lm is a linear model between the response variable and the gradient
 of the nonlinear model and as we see below x is not part of that
 linear model so x can't be in newdata when predicting from the tangent
 model.  We can only make predictions at the original x points.   For
 other x's we could use Interpolation. See ?approx  (?spline can also
 work in smooth cases but in the example provided the function has a
 kink and that won't work well with splines.)

 as.lm(modellogis)$model
              y          a             b             c  (offset)
 1   1.281055090 0.06601796 -4.411829e-01  1.168928e+00  1.397153
 2   1.563609934 0.04798815 -3.268846e-01  9.766080e-01  1.015584
 3   0.001570796 0.04798815 -3.268846e-01  9.766080e-01  1.015584
 4   2.291579783 0.16311227 -9.767241e-01  1.597189e+00  3.451981
 5   0.841891853 0.12203013 -7.665928e-01  1.512752e+00  2.582551
 6   6.553951324 0.21464369 -1.206154e+00  1.564573e+00  4.542552
 7  14.243274230 0.74450055 -1.361047e+00 -1.455630e+00 15.756031
 8  14.519899320 0.59707858 -1.721353e+00 -6.770205e-01 12.636107
 9  15.066473610 0.74450055 -1.361047e+00 -1.455630e+00 15.756031
 10 21.728809880 1. -2.943955e-13 -9.073765e-12 21.163223
 11 18.553054450 1. -2.943955e-13 -9.073765e-12 21.163223
 12 23.722637370 1. -2.943955e-13 -9.073765e-12 21.163223


I have added a FAQ to the home page since this isn't the first time
this question has come up:

http://nls2.googlecode.com#FAQs

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splus equivalent of reshape in R

2012-05-16 Thread Jim Lemon


On 05/16/2012 01:18 PM, Santosh wrote:

Hello R/Splus users..
I am posting in R discussion group in hope of wider response compared to
what I received from Splus user groups

Was wondering if there is any function available in Splus 8.2 that is
equivalent to reshape of R?

Below is a sample dataset. Size [both rows and columns) of the dataset may
vary...


Hi Santosh,
You may be able to use the code in the function rep_n_stack in the 
prettyR package in S-PLUS. It does what you want, and since it is 
written in R source code, it may run in S-PLUS. Just extract the code 
and source it into S-PLUS.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] order a data frame by date with orderl

2012-05-16 Thread Benedikt Gehr

Hi,
many thanks for your answer. if i set tz=GMT it does the job. Great! 
thanks

cheers

Benedikt

Am 16.05.2012 12:20, schrieb jholtman [via R]:
 Is the a daylight saving time problem?  Check your timezone and see 
 when it occurred; these times might not be legal.

 Sent from my iPad

 On May 16, 2012, at 3:27, Benedikt Gehr [hidden email] 
 /user/SendEmail.jtp?type=nodenode=4630229i=0 wrote:

  Hi
 
  I have a rather large data frame (7000 rows with 28 columns) which 
 I want
  to sort by date. Below I have a example of the data frame. The Date 
 column
  is called DT, is a factor and looks like this:
 
  class(res.merge$DT)
  [1] factor
  head(res.merge$DT)
  [1] 17.3.2012 13:54:02 17.3.2012 14:00:07 17.3.2012 14:30:25 17.3.2012
  15:01:15
  [5] 17.3.2012 15:32:14 17.3.2012 16:01:29
  2530 Levels: 1.4.2012 00:00:52 1.4.2012 00:30:29 ... 9.5.2012 15:30:50
 
  res.merge is the data frame unordered. Now I want to order the data 
 frame
  with:
 
  
 res.ordered-res.merge[order(as.POSIXct(as.character(res.merge$DT),format=%d.%m.%Y
  %H:%M:%S)),]
 
  This works in fact, however for some reason there are always two 
 entires
  that go at the end of the data frame for no obvious reason (see below,
  09.05.2012 ist the most recent date). And this is the case for 
 different
  data.frames. The two entries at the end are always 25.3.2012 
 02:00:xx and
  25.3.2012 02.30.xx.
 
  Can anybody tell me what the problem is? Any help is most appreciated.
 
  Best Benedikt
 
  res.ordered[2545:2549,]
  DT Typ  NOD Day_s DOW_s   Time_s Long   
Lat
  2547  9.5.2012 14:30:56 GPS 1893  9.5.2012We 14:30:00 7.452218 
 46.43579
  2548  9.5.2012 15:02:09 GPS 1893  9.5.2012We 15:00:35 7.451983 
 46.43583
  2549  9.5.2012 15:30:50 GPS 1893  9.5.2012We 15:30:00 7.451973 
 46.43597
  1845 25.3.2012 02:00:18 GPS 1848 25.3.2012So 02:00:01 7.454266 
 46.45414
  1846 25.3.2012 02:30:16 GPS 1848 25.3.2012So 02:30:00 7.454413 
 46.45437
  Height TOF Status FO_GPS GPS_N AOT Day_e DOW_e   Time_e   BV 
 Temp
  SOG
  2547 1182.8   3  A  1   143  55  9.5.2012We 14:30:56 
 3735   31
  0.09
  2548 1182.8   3  A  1   143  94  9.5.2012We 15:02:09 
 3637   32
  0.02
  2549 1176.5   3  A  1   143  50  9.5.2012We 15:30:50 
 3730   29
  0.17
  1845 1295.2   3  A  1   151  17 25.3.2012So 02:00:18 
 37157
  0.18
  1846 1287.3   3  A  1   144  16 25.3.2012So 02:30:16 
 37208
  0.14
  Heading  SAE  HAE BW_2 BW_3  X..
  2547   24.90 3.81 9.47 3666 3625 9.08
  25487.86 0.51 7.17 3593 3586 9.11
  2549  344.72 2.86 4.10 3662 3623 9.12
  1845  335.54 3.53 5.63 3618 3618 0.81
  1846   75.37 5.44 8.96 3618 3618 0.81
 
  --
  View this message in context: 
 http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=1 
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 [hidden email] /user/SendEmail.jtp?type=nodenode=4630229i=2 
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 If you reply to this email, your message will be added to the 
 discussion below:
 http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225p4630229.html
  

 To unsubscribe from order a data frame by date with order, click here 
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4630225code=YmVuZWRpa3QuZ2VockBpZXUudXpoLmNofDQ2MzAyMjV8LTc4NzA5MjQxMQ==.
 NAML 
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  


-- 
Benedikt Gehr
Ph.D. Student

Institute of Evolutionary Biology and Environmental Studies
University of Zurich
Winterthurerstrasse 190
CH-8057 Zurich

Office 13 J 36b
Phone: +41 (0)44 635 49 72
http://www.ieu.uzh.ch/staff/phd/gehr.html



--
View this message in context: 
http://r.789695.n4.nabble.com/order-a-data-frame-by-date-with-order-tp4630225p4630237.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] How to sum and group data by DATE in data frame

2012-05-16 Thread Cren


Michael Weylandt wrote
 
 Can you provide a reproducible example? 
 
Of course, Michael.

Consider the following time series:


11/2/2011 14:30 123.53
11/2/2011 15:00 123.78
11/2/2011 15:30 124.24
11/2/2011 16:00 124.2
11/2/2011 16:30 124.07
11/2/2011 17:00 123.91
11/2/2011 17:30 123.44
11/2/2011 18:00 123.0616
11/2/2011 18:30 123.06
11/2/2011 19:00 123.13
11/2/2011 19:30 123.745
11/2/2011 20:00 123.96
11/2/2011 20:30 123.99
11/2/2011 21:00 123.99
11/3/2011 14:30 124.3
11/3/2011 15:00 124.38
11/3/2011 15:30 124.67
11/3/2011 16:00 125.19
11/3/2011 16:30 124.9
11/3/2011 17:00 125.27
11/3/2011 17:30 125.5
11/3/2011 18:00 125.58
11/3/2011 18:30 125.91
11/3/2011 19:00 125.8
11/3/2011 19:30 125.83
11/3/2011 20:00 126.215
11/3/2011 20:30 126.25
11/3/2011 21:00 126.25
11/4/2011 14:30 124.901
11/4/2011 15:00 124.43
11/4/2011 15:30 124.4654
11/4/2011 16:00 124.46
11/4/2011 16:30 124.68
11/4/2011 17:00 124.86
11/4/2011 17:30 124.73
11/4/2011 18:00 125.22
11/4/2011 18:30 125.48
11/4/2011 19:00 125.5601
11/4/2011 19:30 125.4091
11/4/2011 20:00 125.15
11/4/2011 20:30 125.43
11/4/2011 21:00 125.481
11/7/2011 15:30 125.91
11/7/2011 16:00 125.29
11/7/2011 16:30 124.79
11/7/2011 17:00 124.77
11/7/2011 17:30 124.7
11/7/2011 18:00 124.37
11/7/2011 18:30 124.56
11/7/2011 19:00 124.86
11/7/2011 19:30 125.3
11/7/2011 20:00 125.59
11/7/2011 20:30 125.95
11/7/2011 21:00 125.73
11/7/2011 21:30 126.27
11/7/2011 22:00 126.26
11/8/2011 15:30 127.33
11/8/2011 16:00 126.37
11/8/2011 16:30 126.46
11/8/2011 17:00 126
11/8/2011 17:30 126.06
11/8/2011 18:00 126.2662
11/8/2011 18:30 126.23
11/8/2011 19:00 126.4499
11/8/2011 19:30 127.12
11/8/2011 20:00 127.48
11/8/2011 20:30 127.49
11/8/2011 21:00 127.69
11/8/2011 21:30 127.88
11/8/2011 22:00 127.88
11/9/2011 15:30 124.51
11/9/2011 16:00 124.42
11/9/2011 16:30 124.92
11/9/2011 17:00 125.18
11/9/2011 17:30 125.23
11/9/2011 18:00 124.81
11/9/2011 18:30 125.07
11/9/2011 19:00 124.61
11/9/2011 19:30 123.8869
11/9/2011 20:00 123.24
11/9/2011 20:30 123.3329
11/9/2011 21:00 123.6
11/9/2011 21:30 123.19
11/9/2011 22:00 123.161

The rownames are datas plus hour, the data column is the time series' value.

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-tp903708p4630228.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Re : Wrong Q3 + Mean.

2012-05-16 Thread Pascal Oettli

Hi,

Probably you could check this:
?quantile

Particularly the 'type' option.

Best Regards,
Pascal


- Mail original -
De : Retep32 retepdel...@web.de
À : r-help@r-project.org
Cc : 
Envoyé le : Mercredi 16 mai 2012 16h22
Objet : [R] Wrong Q3 + Mean.

Hi.

 a
[1] 13 13 14 14 15 15 16 20 21 26
 summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   13.0    14.0    15.0    16.7    19.0    26.0 
 mean(a)
[1] 16.7
 quantile(a)
  0%  25%  50%  75% 100% 
  13   14   15   19   26 

Clearly, this is not right. My Instructor and I have no idea why the program
does that. I removed the program from the computer , installed it again and
it still shows the mistake.  It is also strange, that I chose english as
installlanguage, but the program is in german (my OS is in german).

Pls help, because otherwise i cannot solve any problems with R.

Using Win7 and R version 2.15.0 (2012-03-30).

Retep

--
View this message in context: 
http://r.789695.n4.nabble.com/Wrong-Q3-Mean-tp4630223.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] finding mean and SD for a log-normal distribution

2012-05-16 Thread Andras Farkas

Dear R Expert
 
allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 
describing my distribution. I would like to convert this distribution into a 
log normal distribution that would best describe it when resimulated using log 
normal distribution. Currently I am using another software to estimate the 
respective mean and SD on the log scale and the results are: 1.6667 and SD 
0.47071. Then, to best reproduce my original distribution in R, I use the 
following commands:
 
c - rlnorm(5000,1.6667,0.47071)
d - exp(c)
mean(c)
sd(c)
 
and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), 
respectively, which I am reasonably happy with. I would like to grow 
independent of the another software I use, but am unable to figure out how to 
generate the values of 1.6667 and 0.47071 using R. could someone please help me 
with this question?
 
thanks,
 
Andras
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write data using xlsReadWrite

2012-05-16 Thread diyanah

Hai, I have change it to these, but error and I couldn't fix it. Do you have
any idea why?

file - system.file(D:\\FYP\\image\\Cropped Images\\user61,
forgerUser61.xlsx, package = xlsx)
wb - loadWorkbook(forgerUser61.xlsx)
sheets - getSheets(wb)
sheet - sheets[[all]]
res - readRows(sheet, startRow=4, endRow=5, startColumn=2, endColumn=3)

Error in readRows(sheet, startRow = 4, endRow = 5, startColumn = 2,
endColumn = 3) : 
attempt to apply non-function

--
View this message in context: 
http://r.789695.n4.nabble.com/write-data-using-xlsReadWrite-tp4629825p4630231.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with get() inside of lme()

2012-05-16 Thread ONKELINX, Thierry

You can achieve that with a combination of as.formula and paste.

library(nlme)
data(petrol, package = MASS)
lme(as.formula(paste(Y.VAR, ~EP)), random= ~1|No, data=petrol)

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
chuck.01
Verzonden: zaterdag 12 mei 2012 23:28
Aan: r-help@r-project.org
Onderwerp: Re: [R] problem with get() inside of lme()

Here is an example:
library(nlme)
library(lme4)
library(MASS)

data(petrol)

# a variable for one of the columns in petrol Y.VAR - Y

# This works:
lmer(get(Y.VAR)~EP +(1|No), data=petrol)

# This doesn't:
lme(get(Y.VAR)~EP, random= ~1|No, data=petrol)

# but this does:
lme(Y~EP, random= ~1|No, data=petrol)

I'd really like to use the variable... again, this is inside a function.

Any idea how to solve this.
Thanks for your time and expertise,
Chuck




chuck.01 wrote

 please note that I edited the original message to say:

 length(with(new3, perm.score))==length(with(new3, get(TRAIT1)))
 [1] TRUE





 chuck.01 wrote

 Hi,
  The following lines of code are inside of a function, where  TRAIT1 is
 a function variable calling a column-name inside of the data.frame
 new3.

 This works just fine:

 m2 - lmer(get(TRAIT1) ~ perm.score + (1|site), data=new3)

 but this will not work:

 m3 - lme(get(TRAIT1) ~ perm.score , random= ~1|site, data=new3)

 I get the following error:

 Error in model.frame.default(formula = ~TRAIT1 + perm.score + site, data
 = list( :
   variable lengths differ (found for 'perm.score')

 it seems to be putting TRAIT1 on the left side of the equation, and if I
 am wrong about that, the different lengths from the error is still not
 true:

 length(with(new3, perm.score))==length(with(new3, get(TRAIT1)))
 [1] TRUE

 Any ideas on either what is going on, or how I can fix this?

 ** I'm not including example data, or function because I am hoping it is
 not needed **
 Please let me know if I am wrong.




--
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-get-inside-of-lme-tp4629360p4629588.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
* * * * * * * * * * * * * D I S C L A I M E R * * * * * * * * * * * * *
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.
The views expressed in this message and any annex are purely those of the 
writer and may not be regarded as stating an official position of INBO, as long 
as the message is not confirmed by a duly signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ANCOVA power

2012-05-16 Thread Xan

Dear list members:

I am trying to calculate power for an ANCOVA analysis.

I have found different solutions such as power.t.test and power.anova.test
but they seem to refer to the ANOVA part of the ANCOVA. 

My model is of the form:
lm (y ~ factor + x1 + x2 + x2*myfactor)

where myfactor is a factorial variable. And I am interested in calculating
the power of the significance test, mainly for the interaction term between
x2 and the factor. 

I would appreciate you help

Xan.

--
View this message in context: 
http://r.789695.n4.nabble.com/ANCOVA-power-tp4630238.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem to resolve a step for reading a large TXT and, split in several file

2012-05-16 Thread Rui Barradas



Hello,

Your bug is obvious, each pass through the loop you read twice and write 
only once. The file pointer keeps moving forward...

Use something like

while (length(pv - readLines(con, n=n))  0 ) {  # note that this line 
changed.

i - i + 1
write.table(pv, file = paste(fileNames.temp.1, _, i, .txt, sep 
= ), sep = \t)

}

(or put the line with read.table where you have readLines.)

Anyway, I don't like it very much. If you know the number of lines in 
the input file, it would be much better to use integer division and 
modulus to determine how many times and how much to read.

Something like

n - 100

passes - number.of.lines.in.file %/% n
remaining - number.of.lines.in.file %% n

for(i in seq.int(passes)){

[ ... read n lines at a time  process them...]

}
if(remaining){
n - remaining

[ ...read what's left... ]
}


If you do not know how many lines are there in the file, see 
(package::function)


parser::nlines
R.utils::countLines

Hope this helps,

Rui Barradas


Em 16-05-2012 11:00, r-help-requ...@r-project.org escreveu:

Date: Tue, 15 May 2012 22:16:42 +0200
From: gianni lavaredogianni.lavar...@gmail.com
To:r-help@r-project.org
Subject: [R] Problem to resolve a step for reading a large TXT and
split in several file
Message-ID:
caj6jbr-ywgjsfu8o0unvet6m8p8wvp7ybosxw5nrdz48wod...@mail.gmail.com
Content-Type: text/plain

Dear Researchs,

It's the first time I am trying to resolve this problem. I have a TXT file
with 1408452 rows. I wish to split file-by-file where each file has
1,000,000 rows with the following procedure:

# split in two file one with 1,000,000 of rows and one with 408,452 of rows

file- 09G001_72975_7575_25_4025.txt
fileNames- strsplit(as.character(file), ., fixed = TRUE)
fileNames.temp.1- unique(as.vector(do.call(rbind, fileNames)[, 1]))

con- file(file, open = r)
# n is the number of row
n- 100
i- 0
while (length(readLines(con, n=n))  0 ) {
 i- i + 1
 pv- read.table(con,header=F,sep=\t, nrow=n)
 write.table(pv, file = paste(fileNames.temp.1,_,i,.txt,sep = ),
sep = \t)
}
close(con)


when I use 1,000,000 I have in the directory only
09G001_72975_7575_25_4025_1.txt (with 100 of rows) and not
09G001_72975_7575_25_4025_2.txt  (with 408,452). I din't understand where
is my bug

Furthermore when i wish for example split in 3 files (where n is 469484 =
1408452/3) i have this message:

*Error in read.table(con, header = F, sep = \t, nrow = n) :
   no lines available in input*

Thanks for all help and sorry for the disturb

[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Automating R for Hypothesis Testing

2012-05-16 Thread meredith

Rui-
  Just a quick question. I understand your comment on using ANOVA, but
doesn't this only test for similarities of the mean. We are trying to see if
a the same model can fit for two or three months, therefore have the similar
slope and intercept. The ANOVA would only do one part of this correct, with
the F-Test?

Thanks!
Meredith

Rui Barradas wrote
 
 Hello,
 
 I'm glad it helped.
 
 As for your second question, I don't know, but I'm not very comfortable
 with the way you're doing things.
 Why subtract the coefficients of model 1 from model 2?
 And why the dummy? Why set model 1 to zero?
 
 Isn't it better to use anova's F? After all, it's designed for it, for the
 linear model...
 And if you really want/need the dummy, wouldn't a nested anova do it? (F
 statistic, once again.)
 
 anova(model1, model2)
 
 is simple and statistically speaking seems to me much better. (I specially
 don't like the subtraction bit.)
 
 Rui Barradas
 
 meredith wrote
 
 Rui-
   Thanks this definitely helps, just one quick question. How would you
 code the values of chi-fm and chi-fms to change based on the degrees of
 freedom of each model H(i)?
 
 Meredith
 
 
 Rui Barradas wrote
 
 Hello,
 
 Yes, it does help. Now we can see your data and what you're doing.
 What follows is a suggestion on what you could do, not full solution.
 (You forgot to say what X1 is, but I don't think it's important to
 understand the suggestion.)
 (If I'm wrong, say something.)
 
 
 milwaukeephos - read.csv(milwaukeephos.csv, header=TRUE,
 stringsAsFactors=FALSE)
 # list of data.frames, one per month
 ls1 - split(milwaukeephos, milwaukeephos$month)
 
 #- if you want to keep the models, not needed if you don't.
 #  (yoy probably don't)
 modelH - vector(list, 12)
 modelHa - vector(list, 12)
 modelH2 - vector(list, 12)
 modelH2a - vector(list, 12)
 #- values to record, these are needed, create them beforehand.
 chi_fm - numeric(12)
 chi_fms - numeric(12)
 #
 seq_months - c(1:12, 1) # wrap months around.
 for(i in 1:12){
 month_this - seq_months[i]
 month_next - seq_months[i + 1]
 
 lload - c(ls1[[month_this]]$load_kg, ls1[[month_next]]$load_kg)
 lflow - c(ls1[[month_this]]$flow, ls1[[month_next]]$flow)
 modelH[[i]] - lm(lload ~ lflow)
 # If you don't want to keep the models, use modelH only
 # ( without [[i]] )
 # and do the same with X1
 
 # rest of your code for first test goes here
 chi_fm[i] - bfm %*% var_fm %*% (bunres_fm - bres_fm)
 
 # and the same for the second test
 chi_fms[i] - ...etc...
 }
 
 
 Hope this helps,
 
 Rui Barradas
 
 
 meredith wrote
 
 dput:  http://r.789695.n4.nabble.com/file/n4620188/milwaukeephos.csv
 milwaukeephos.csv 
 
 # Feb-march
 modelH_febmarch-lm(llfeb_march~lffeb_march)
modelHa_febmarch-lm(llfeb_march~X1feb_mar+lffeb_march)
 anova(modelHa_febmarch)
 coefficients(modelH_febmarch)
 (Intercept) lffeb_march 
   -2.4298901.172821 
 coefficients(modelHa_febmarch)
 (Intercept)   X1feb_mar lffeb_march 
  -2.8957776  -0.5272793   1.3016303 
 bres_fm-matrix(c(-2.429890,0,1.172821),nrow=3)
 bunres_fm-matrix(c(-2.8957776,-0.5272793,1.3016303),nrow=3)
bfm-t(bunres_fm-bres_fm)
 fmvect-seq(1,1,length=34)
 X1a_febmar-seq(0,0,length=9) # dummy variable step 1
 X1b_febmar-seq(1,1,length=25) # dummy variable step 2
 X1feb_mar-c(X1a_febmar,X1b_febmar) #dummy variable creation
 # Test Stat Equation for Chisq
 fmxx-cbind(fmvect,X1feb_mar,lffeb_march)
 tfmx-t(fmxx)
 xcom_fm-(tfmx %*% fmxx)
 xinv_fm-ginv(xcom_fm)
 var_fm-xinv_fm*0.307
 chi_fm-bfm %*% var_fm %*% (bunres_fm-bres_fm)
 chi_fm # chisq value for recording
 if less than CV move onto to slope modification
 modelH2_febmarch-lm(llfeb_march~X3feb_march)
 modelH2a_febmarch-lm(llfeb_march~X3feb_march+X4feb_march)
 anova(modelH2a_febmarch)
 coefficients(modelH2_febmarch) # get coefficients to make beta vectors
 for test
 (Intercept) X3feb_march 
5.3421301.172821 
 coefficients(modelH2a_febmarch)
 (Intercept) X3feb_march X4feb_march 
   5.2936263   1.0353752   0.2407557 
 # Test Stat
 bsres_fm-matrix(c(5.342130,1.172821,0),nrow=3)
 bsunres_fm-matrix(c(5.2936263,1.0353752,0.2407557),nrow=3)
 bsfm-t(bsunres_fm-bsres_fm)
 #X matrix
 fmxs-cbind(fmvect,X3feb_march,X4feb_march)
 tfmxs-t(fmxs)
 xcoms_fm-(tfmxs %*% fmxs)
 xinvs_fm-ginv(xcoms_fm)
 var_fms-xinvs_fm*0.341
 chi_fms-bsfm %*% var_fms %*% (bsunres_fm-bsres_fm)
 chi_fms
 # Record Chisq value
 
 Does this help?
 Here lffeb_march is the combination of Feb and March log flows
 and llfeb_march is the combination of Feb and March log loads
 X3: lffeb_march-mean(feb_march)
 X4: X1*X3
 
 Thanks
 
 Rui Barradas wrote
 
 Hello,
 
 I'm not at all sure if I understand your problem. Does this describe
 it?
 
 
 test first model for months 1 and 2
 if test statistic less than critical value{
   test second model for months 1 and 2
   print results of the first and second tests? just one of them?
 }
 move on to months 2 and 3
 etc, until months 12 and 1

Re: [R] Help needed for efficient way to loop through rows and columns

2012-05-16 Thread David L Carlson

Can you show us what you want the final data.frame to look like? You've
created five variables stored as factors and you seem to be trying to change
those to numeric values? Is that correct? 

Since AB and BA are always set to 1, you could just replace those values
globally rather than mess with the ifelse commands for those values. Only AA
and BB are affected by the value of AorB.

Your apply() function processes the data.frame by row so i is a vector
consisting of all the values in the row. You seem to be coding as if i was a
single integer (as in a for loop).

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Priya Bhatt
 Sent: Wednesday, May 16, 2012 3:08 AM
 To: r-help@r-project.org
 Subject: [R] Help needed for efficient way to loop through rows and
 columns
 
 Dear R-helpers:
 
 I am trying to write a script that iterates through a dataframe that
 looks
 like this:
 
 
 Example dataset called sample:
 
 names - c(S1, S2, S3, S4)
 X - c(BB, AB, AB, AA)
 Y - c(BB, BB, AB, AA)
 Z - c(BB, BB, AB, NA)
 AorB - c(A, A, A, B)
 
 sample - data.frame(names, X, Y, Z, AorB)
 
 
 for a given row,
 
 if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0
 
 if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2
 
 I've been trying  to write this using apply and ifelse statements in
 hopes
 that my code runs quickly, but I'm afraid I've make a big mess.  See
 below:
 
 apply(sample, 1, function(i) {
 
 
   ifelse(sample$AorB[i] == A,
  (ifelse(sample[i,] == AA, sample[i,] - 2 ,
  ifelse(sample[i,] == AB || sample[i,] == BA ,
 sample[i,] - 1,
 ifelse(sample[i,] == BB, sample[i,] - 0,
 sample[i,] - NA )) )
   )   , ifelse(sample$AorB[i,] == B),
  (ifelse(sample[i,] == AA, sample[i,] - 0 ,
  ifelse(sample[i,] == AB || sample[i,] == BA ,
 sample[i,] - 1,
 ifelse(sample[i,] == BB, sample[i,] - 2,
 sample[i,] - NA) })
 
 
 Any Advice?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] correlation among variables in the same subset

2012-05-16 Thread R. Michael Weylandt

? cor

e.g.,

x - data.frame(rnorm(5), rnorm(5), rnorm(5), rnorm(5), rnorm(5))

cor(x)

Best,
Michael

On Wed, May 16, 2012 at 6:52 AM, Andrea Sica aerdna.s...@gmail.com wrote:
 Dear all,

 I have created a subset from my dataset, which contains 6 variables.
 I need to make the correlation among all of them, possibly, without
 making it one by one. Is there any command that can permits me to
 do it directly for all of them in the same time?

 Thank you so much in advance.

 Andrea

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tm package: problem of TermDocumentMatrix and minWordLength

2012-05-16 Thread Baoqiang Cao

try this:

dtm - DocumentTermMatrix(examplecorpus, control = list(wordLengths=c(1,100)))



On Wed, May 16, 2012 at 6:22 AM, C.H. chainsawti...@gmail.com wrote:
 Dear All,

 The following code illustrate the problem.

 [R code]
 require(tm)
 exampledoc - c(R is good, R is really good)
 examplecorpus - Corpus(VectorSource(exampledoc), encoding = UTF-8)
 dtm - DocumentTermMatrix(examplecorpus, control = list(minWordLength = 1))
 as.matrix(dtm)
 [/R code]

 The term R and is were not included in the dtm even the control
 parameter minWordLength was set to 1.

    Terms
 Docs good really
   1    1      0
   2    1      1

 Would you reproduce this problem?

 The following is my sessionInfo

 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: i686-pc-linux-gnu (32-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] tm_0.5-7.1

 loaded via a namespace (and not attached):
 [1] compiler_2.15.0 slam_0.1-23     tools_2.15.0

 Regards,

 CH

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to sum and group data by DATE in data frame

2012-05-16 Thread R. Michael Weylandt

Fascinating... dput() has never given me anything that looks like
that I would have expected something much more like

z - structure(c(123.53, 123.78, 124.24, 124.2, 124.07, 123.91, 123.44,
123.0616, 123.06, 123.13, 123.745, 123.96, 123.99, 123.99, 124.3,
124.38, 124.67, 125.19, 124.9, 125.27, 125.5, 125.58, 125.91,
125.8, 125.83, 126.215, 126.25, 126.25, 124.901, 124.43, 124.4654,
124.46, 124.68, 124.86, 124.73, 125.22, 125.48, 125.5601, 125.4091,
125.15, 125.43, 125.481, 125.91, 125.29, 124.79, 124.77, 124.7,
124.37, 124.56, 124.86, 125.3, 125.59, 125.95, 125.73, 126.27,
126.26, 127.33, 126.37, 126.46, 126, 126.06, 126.2662, 126.23,
126.4499, 127.12, 127.48, 127.49, 127.69, 127.88, 127.88, 124.51,
124.42, 124.92, 125.18, 125.23, 124.81, 125.07, 124.61, 123.8869,
123.24, 123.3329, 123.6, 123.19, 123.161), index = structure(c(1320258600,
1320260400, 1320262200, 1320264000, 1320265800, 1320267600, 1320269400,
1320271200, 1320273000, 1320274800, 1320276600, 1320278400, 1320280200,
1320282000, 1320345000, 1320346800, 1320348600, 1320350400, 1320352200,
1320354000, 1320355800, 1320357600, 1320359400, 1320361200, 1320363000,
1320364800, 1320366600, 1320368400, 1320431400, 1320433200, 1320435000,
1320436800, 1320438600, 1320440400, 1320442200, 1320444000, 1320445800,
1320447600, 1320449400, 1320451200, 1320453000, 1320454800, 1320697800,
1320699600, 1320701400, 1320703200, 1320705000, 1320706800, 1320708600,
1320710400, 1320712200, 1320714000, 1320715800, 1320717600, 1320719400,
1320721200, 1320784200, 1320786000, 1320787800, 1320789600, 1320791400,
1320793200, 1320795000, 1320796800, 1320798600, 1320800400, 1320802200,
1320804000, 1320805800, 1320807600, 1320870600, 1320872400, 1320874200,
1320876000, 1320877800, 1320879600, 1320881400, 1320883200, 1320885000,
1320886800, 1320888600, 1320890400, 1320892200, 1320894000), class =
c(POSIXct,
POSIXt), tzone = ), class = zoo)

which is about 100x more convenient

With that,

aggregate(z, as.Date(time(z)), sum)

and

aggregate(z, format(time(z), %m %d), sum)

give different results (at least in my time zone) so try the latter
(it seems to be what you were probably looking for)

If that doesn't nail it down, I'll need you to answer the questions I
asked in my previous email.

Best,
Michael

On Wed, May 16, 2012 at 6:14 AM, Cren oscar.soppe...@bancaakros.it wrote:

 Michael Weylandt wrote

 Can you provide a reproducible example?

 Of course, Michael.

 Consider the following time series:


 11/2/2011 14:30 123.53
 11/2/2011 15:00 123.78
 11/2/2011 15:30 124.24
 11/2/2011 16:00 124.2
 11/2/2011 16:30 124.07
 11/2/2011 17:00 123.91
 11/2/2011 17:30 123.44
 11/2/2011 18:00 123.0616
 11/2/2011 18:30 123.06
 11/2/2011 19:00 123.13
 11/2/2011 19:30 123.745
 11/2/2011 20:00 123.96
 11/2/2011 20:30 123.99
 11/2/2011 21:00 123.99
 11/3/2011 14:30 124.3
 11/3/2011 15:00 124.38
 11/3/2011 15:30 124.67
 11/3/2011 16:00 125.19
 11/3/2011 16:30 124.9
 11/3/2011 17:00 125.27
 11/3/2011 17:30 125.5
 11/3/2011 18:00 125.58
 11/3/2011 18:30 125.91
 11/3/2011 19:00 125.8
 11/3/2011 19:30 125.83
 11/3/2011 20:00 126.215
 11/3/2011 20:30 126.25
 11/3/2011 21:00 126.25
 11/4/2011 14:30 124.901
 11/4/2011 15:00 124.43
 11/4/2011 15:30 124.4654
 11/4/2011 16:00 124.46
 11/4/2011 16:30 124.68
 11/4/2011 17:00 124.86
 11/4/2011 17:30 124.73
 11/4/2011 18:00 125.22
 11/4/2011 18:30 125.48
 11/4/2011 19:00 125.5601
 11/4/2011 19:30 125.4091
 11/4/2011 20:00 125.15
 11/4/2011 20:30 125.43
 11/4/2011 21:00 125.481
 11/7/2011 15:30 125.91
 11/7/2011 16:00 125.29
 11/7/2011 16:30 124.79
 11/7/2011 17:00 124.77
 11/7/2011 17:30 124.7
 11/7/2011 18:00 124.37
 11/7/2011 18:30 124.56
 11/7/2011 19:00 124.86
 11/7/2011 19:30 125.3
 11/7/2011 20:00 125.59
 11/7/2011 20:30 125.95
 11/7/2011 21:00 125.73
 11/7/2011 21:30 126.27
 11/7/2011 22:00 126.26
 11/8/2011 15:30 127.33
 11/8/2011 16:00 126.37
 11/8/2011 16:30 126.46
 11/8/2011 17:00 126
 11/8/2011 17:30 126.06
 11/8/2011 18:00 126.2662
 11/8/2011 18:30 126.23
 11/8/2011 19:00 126.4499
 11/8/2011 19:30 127.12
 11/8/2011 20:00 127.48
 11/8/2011 20:30 127.49
 11/8/2011 21:00 127.69
 11/8/2011 21:30 127.88
 11/8/2011 22:00 127.88
 11/9/2011 15:30 124.51
 11/9/2011 16:00 124.42
 11/9/2011 16:30 124.92
 11/9/2011 17:00 125.18
 11/9/2011 17:30 125.23
 11/9/2011 18:00 124.81
 11/9/2011 18:30 125.07
 11/9/2011 19:00 124.61
 11/9/2011 19:30 123.8869
 11/9/2011 20:00 123.24
 11/9/2011 20:30 123.3329
 11/9/2011 21:00 123.6
 11/9/2011 21:30 123.19
 11/9/2011 22:00 123.161

 The rownames are datas plus hour, the data column is the time series' value.

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-tp903708p4630228.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and

Re: [R] Reading Excel Formulas as values

2012-05-16 Thread David L Carlson

I can't replicate your problem. I created a spreadsheet in Excel 2007
consisting of three columns.  Numbers from 1 - 15, rand(), and the sum of
the first two columns. Using all the defaults with read.xlsx() (package:
xlsx), I get the values of each column and using keepFormulas=TRUE, I get
the formulas as factors. I don't get any NA's. I can also place a formula on
the second sheet that accesses data from the first sheet without any
problems. I haven't tried, Excel 2010. 

Could your formulas be accessing data from another spreadsheet?

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Mike Smith
 Sent: Tuesday, May 15, 2012 3:11 PM
 To: r-help@r-project.org
 Subject: [R] Reading Excel Formulas as values
 
 When I read excel files using the read.xlsx() command any cells that
 have
 formulas in them come up as NA.
 
 Is there a way to read just the numeric value of the cell without using
 the
 paste value command in Excel?  I need to read in hundreds of Excel
 spreadsheets and compile them into one large super spreadsheet
 automatically.  Hence the reason I cannot reformat each sheet manually.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals for nls or nls2 model

2012-05-16 Thread Francisco Mora Ardila

Thanks! Now it is clear.

Francisco

On Wed, 16 May 2012 07:32:56 -0400, Gabor Grothendieck wrote
 On Tue, May 15, 2012 at 11:20 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  On Tue, May 15, 2012 at 8:08 PM, Francisco Mora Ardila
  fm...@oikos.unam.mx wrote:
  Hi all
 
  I have fitted a model usinf nls function to these data:
 
  x
   [1]   1   0   0   4   3   5  12  10  12 100 100 100
 
  y
   [1]  1.281055090  1.563609934  0.001570796  2.291579783  0.841891853
   [6]  6.553951324 14.243274230 14.519899320 15.066473610 21.728809880
  [11] 18.553054450 23.722637370
 
  The model fitted is:
 
  modellogis-nls(y~SSlogis(x,a,b,c))
 
  It runs OK. Then I calculate confidence intervals for the actual data 
  using:
 
  dataci-predict(as.lm(modellogis), interval = confidence)
 
  BUt I don´t get smooth curves when plotting it, so I want to get other 
  confidence
  vectors based on a new x vector by defining a new data to do predictions:
 
  x0 -  seq(0,15,1)
  dataci-predict(as.lm(modellogis), newdata=data.frame(x=x0), interval = 
confidence)
 
  BUt it does not work: I get the same initial confidence interval
 
  Any ideas on how to get tconfidence and prediction intervals using new X 
  data on a
  previous model?
 
 
  as.lm is a linear model between the response variable and the gradient
  of the nonlinear model and as we see below x is not part of that
  linear model so x can't be in newdata when predicting from the tangent
  model.  We can only make predictions at the original x points.   For
  other x's we could use Interpolation. See ?approx  (?spline can also
  work in smooth cases but in the example provided the function has a
  kink and that won't work well with splines.)
 
  as.lm(modellogis)$model
               y          a             b             c  (offset)
  1   1.281055090 0.06601796 -4.411829e-01  1.168928e+00  1.397153
  2   1.563609934 0.04798815 -3.268846e-01  9.766080e-01  1.015584
  3   0.001570796 0.04798815 -3.268846e-01  9.766080e-01  1.015584
  4   2.291579783 0.16311227 -9.767241e-01  1.597189e+00  3.451981
  5   0.841891853 0.12203013 -7.665928e-01  1.512752e+00  2.582551
  6   6.553951324 0.21464369 -1.206154e+00  1.564573e+00  4.542552
  7  14.243274230 0.74450055 -1.361047e+00 -1.455630e+00 15.756031
  8  14.519899320 0.59707858 -1.721353e+00 -6.770205e-01 12.636107
  9  15.066473610 0.74450055 -1.361047e+00 -1.455630e+00 15.756031
  10 21.728809880 1. -2.943955e-13 -9.073765e-12 21.163223
  11 18.553054450 1. -2.943955e-13 -9.073765e-12 21.163223
  12 23.722637370 1. -2.943955e-13 -9.073765e-12 21.163223
 
 
 I have added a FAQ to the home page since this isn't the first time
 this question has come up:
 
 http://nls2.googlecode.com#FAQs
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


--
Francisco Mora Ardila
Estudiante de Doctorado
Centro de Investigaciones en Ecosistemas
Universidad Nacional Autónoma de México

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error on easy way for JoSAE Package

2012-05-16 Thread David Winsemius



On May 16, 2012, at 1:33 AM, ana24maria wrote:


Thank you very much.
After using dput and the easy way ( result -  
eblup.mse.f.wrap(domain.data

= amigo, lme.obj = fit.lme)),
i have got the following error:

Error in `[.data.frame`(sample.data, , variabs) :
 undefined columns selected


What John was asking you to do was at your console just type:

dput(amigo)

... and then copy the output to an email and send that to the list.  
Your first posting had data that was ambiguous as to content as well  
as mangled by the various email clients and servers that processed on  
the path to our eyes.




What should I do?


You should also read the Posting Guide.


--
View this message in context: 
http://r.789695.n4.nabble.com/Error-on-easy-way-for-JoSAE-Package-tp4625684p4630220.html




PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding mean and SD for a log-normal distribution

2012-05-16 Thread Uwe Ligges




On 16.05.2012 12:37, Andras Farkas wrote:

Dear R Expert

allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 describing my 
distribution. I would like to convert this distribution into a log normal 
distribution that would best describe it when resimulated using log normal distribution. 
Currently I am using another software to estimate the respective mean and SD on the log 
scale and the results are: 1.6667 and SD 0.47071. Then, to best reproduce my original 
distribution in R, I use the following commands:

c- rlnorm(5000,1.6667,0.47071)
d- exp(c)
mean(c)
sd(c)

and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), 
respectively, which I am reasonably happy with. I would like to grow 
independent of the another software I use, but am unable to figure out how to 
generate the values of 1.6667 and 0.47071 using R. could someone please help me 
with this question?


Just make use of a textbook:

meanlog - log(6) - 0.5 * log(1 + 9/(6^2))
sdlog - sqrt(log(1 + 9/(6^2)))

Uwe Ligges





thanks,

Andras
[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] code to iterate function apply to matrix

2012-05-16 Thread Uwe Ligges




On 16.05.2012 08:11, umai88 wrote:

I got this code below and i want to repeat the loop for 100 times..


And what is the problem? What are you aiming at?

Uwe Ligges





x-rnorm(60)
mat1-matrix(x,nrow=15,ncol=4)

trim-numeric(ncol(mat1))
win-numeric(ncol(mat1))
ssd-numeric(ncol(mat1))

for(j in 1:ncol(mat1))
{
n=length(mat1[,j])
alpha=0.1
k=floor(alpha*n)+1
r=k-(alpha*n)
i=k+1
m=n-k
y1-sort(mat1[,j])
y-y1[i:m]
x.low=(1-r)*y1[k+1]+r*y1[k]
x.upp=(1-r)*y1[n-k]+r*y1[n-k+1]
trim[j] =1/((1-2*alpha)*n)*(sum(y)+r*(y1[k]+y1[n-k+1]))
win[j]=1/n*(sum(y)+k*(x.low+x.upp))
ssd[j]-sum((y-win[j])**2)+k*( (y1[k+1]-win[j])**2 + (y1[n-k]-win[j])**2 )
}

trim.mean-matrix(trim, nrow=1)
win.mean-matrix(win, nrow=1)
sum.sq.dev-matrix(ssd, nrow=1)

--
View this message in context: 
http://r.789695.n4.nabble.com/code-to-iterate-function-apply-to-matrix-tp4630221.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to use the value of rect to determine the location of legend

2012-05-16 Thread Uwe Ligges




On 16.05.2012 02:13, Gundala Viswanath wrote:

Given the attached plot,


Nothing came through.


how can I locate the center text with Mean and SD so that it can be
placed exactly under ---emp.?

The current code I have is this:

L = list(bquote(Em.Mean ==.(new_avg)),bquote(Em.SD==.(new_std)),
bquote(Th.Mean ==.(theor_avg)),
 bquote(Th.SD==.(theor_sd)))


Not reproducible.

Uwe Ligges




legend(topright, c(kids,emp.), cex=0.7, bty=n, col=c(cm.colors(6), red),
 pch=c(rep(19, 6), -5), lty = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0), )

# How can I locate this
legend(topcenter, cex=0.5, bty=n, legend=sapply(L, as.expression))

-G.V.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Max Kuhn

More information is needed to be sure, but it is most likely that some
of the resampled rpart models produce the same prediction for the
hold-out samples (likely the result of no viable split being found).

Almost every incarnation of R^2 requires the variance of the
prediction. This particular failure mode would result in a divide by
zero.

Try using you own summary function (see ?trainControl) and put a
print(summary(data$pred)) in there to verify my claim.

Max

On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).

 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.

 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.

 Max

 On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote:
 Hy,
 I got the following problem when trying to build a rpart model and using
 everything but LOOCV. Originally, I wanted to used k-fold partitioning,
 but every partitioning except LOOCV throws the following warning:

 
 Warning message: In nominalTrainWorkflow(dat = trainData, info =
 trainInfo, method = method, : There were missing values in resampled
 performance measures.
 -

 Below are some simplified testcases which repoduce the warning on my
 system.

 Question: What does this error mean? How can I avoid it?

 System-Information:
 -
 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] rpart_3.1-52   caret_5.15-023 foreach_1.4.0  cluster_1.14.2
 reshape_0.8.4
 [6] plyr_1.7.1     lattice_0.20-6

 loaded via a namespace (and not attached):
 [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0     iterators_1.0.6
 [5] tools_2.15.0
 ---


 Simlified Testcase I: Throws warning
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 train(formula, data=trees,  method='rpart')
 ---

 Simlified Testcase II: Every other CV-method also throws the warning,
 for example using 'cv':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='cv')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---

 Simlified Testcase III: The only CV-method which is working is 'LOOCV':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='LOOCV')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---


 Thanks!
 --
 Dominik Bruhn
 mailto: domi...@dbruhn.de




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Max



-- 

Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need your help setting $R_check_force_suggests = FALSE on Windows system

2012-05-16 Thread Uwe Ligges




On 15.05.2012 19:23, Zhiqiu Hu wrote:

r-help@r-project.org
Dear friends,

I want to make the following  change of R setting on a windows 7 desktop.

$R_check_force_suggests = FALSE


You can change it globally in the operating systems defaults for 
environment variable, or for the current session in the Windows command 
shell (cmd) you can simply say


set _R_CHECK_FORCE_SUGGESTS_=FALSE

Note the underscores and the upper case spelling!

Uwe Ligges





Since I have no experience using Unix, I don't how to make the
suggestions in writing R extension works for windows. I will
appreciate if you would help me to figure out what is the equivalent
of the following settings in Windows system.

***
In addition to the available command line options, R CMD check also
allows customization by setting (Perl) configuration variables in a
configuration file, the location of which can be specified via the
--rcfile option and defaults to $HOME/.R/check.conf provided that the
environment variable HOME is set.

The following configuration variables are currently available.

$R_check_force_suggests
If true, give an error if suggested packages are not available.
Default: true.
***

Installation paths on my desktop
C:\Rtools
C:\Program Files\R\R-2.15.0\bin\x64

Thank you very much.

Noah

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] kolmogorov-Smirnov critical values

2012-05-16 Thread aramos

Hi!

Any one knows how to obtain critical values for the k-s statistic, using R?

Thanks,
Alex

--
View this message in context: 
http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merging multiple data sets

2012-05-16 Thread Bharat Warule

Hello R user,

I have four data sets in dir D:/Bharat Warule/Rdata_file which are
output_data_prod_1.rda, output_data_prod_2.rda, output_data_prod_3.rda,
output_data_prod_4.rda.
Each data set is huge size like number of rows 343297 and columns are near
to 50. 

For example:

x1 - data.frame(x11=c(1,2,3,4,5),x112=c(10,10,10,10,10))
x2 - data.frame(x11=c(1,2,3,4,5),x122=c(20,20,20,20,20))
x3 - data.frame(x11=c(1,2,3,4,5),x132=c(30,30,30,30,30))
x4 - data.frame(x11=c(1,2,3,4,5),x142=c(40,40,40,40,40))
x5 - data.frame(x11=c(1,2,3,4,5),x152=c(50,50,50,50,50))

for(i in 1:5){
name - paste('x',i,sep='')
name1 - paste(name,rda,sep='.')   
save(name, file = name1)
}


I want merge this data sets into one data set but I don’t know where I am
going wrong?

Please help me. Thanks for your help.  

subsetname -  x1
file_no-   4
output_data_prod-   data.frame()

for(n in 1:file_no){
 myfile- gsub(( ), , paste(subsetname , _, n,.rda))
 temp_data - load(file = myfile)
 data_22   - get(temp_data)

 if(dim(output_data_prod)[1]==0){output_data_prod - data_22  
  }else{
  output_data_prod - merge(inData1 = output_data_prod,
inData2 = data_22 ,type = inner, all=FALSE ,
by =c(x11))}

}


-
Bharat Warule 
Cypress Analytica ,
Pune
--
View this message in context: 
http://r.789695.n4.nabble.com/Merging-multiple-data-sets-tp4630244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hmisc improveProb() and PredictABEL reclassification () function and continuous NRI

2012-05-16 Thread Svingen, Gard Frodahl

Dear Sirs.

I am working with the  R packages Hmisc and PredictABEL to make NRI estimates 
from my Cox models with and without a specific biomarker.
According to Pencina et al (Statistics in Medicine 2010, DOI: 0.1002/sim.4085 
), a continuous/non-categorical NRI (NRI0) is to be used when there are no 
obvious reason to categorize risk, such as the risk of future cardiovascular 
events in patients with established cardiovascular disease.

My question is therefore: Which value(s) are to be used in the calculation of 
continuous NRI from the output in Hmisc or in PredictABEL? Does continuous NRI 
equal total NRI in the output?



Yours sincerely

Gard Frodahl T. Svingen
PhD student
the University of Bergen
Bergen, Norway


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vector w/o arithmetic addition for boxplot

2012-05-16 Thread Uwe Ligges




On 15.05.2012 23:47, rl269 wrote:

Hello,

I am having trouble  asking R to read individual numeric vectors for a box
plot of the residuals of a linear regression.  It is performing arithmetic
addition on the 16 individual variables that I want individual box plots
for.

I have 16 race*treatment variables that were created from cleaned
data.frames for race and treatment independently: t1W, t1B.t4W, t4B,
t4H, t4O.

class(t1W) produces numeric (1000 observations of 1's and 0's)

To create the box plot I am using
boxplot(residuals(IRR)~ treatRace_clean)

where I have tried treatRace_clean as both of the following
treatRace_clean- as.factor(as.vector(t1W + t1B + t1H + t1O + t2W + t2B +
t2H + t2O + t3W + t3B +
t3H + t3O + t4W + t4B + t4H + t4O))
treatRace_clean- as.vector(c(t1W, t1B ,t1H ,t1O, t2W , t2B ,t2H ,t2O ,t3W
, t3B,
t3H + t3O , t4W , t4B , t4H , t4O))



Actually, I have no idea what you are really aiming at, reproducible 
code and a precise description would help a lot.


Uwe Ligges


However, I continue to get this error code:
Error: $ operator is invalid for atomic vectors

Thoughts?

--
View this message in context: 
http://r.789695.n4.nabble.com/vector-w-o-arithmetic-addition-for-boxplot-tp4630190.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] clusters in zero-inflated negative binomial models

2012-05-16 Thread Lies Durnez

Dear all,

I want to build a model in R based on animal collection data, that look like 
the following

Nr  Village DistrictSiteSurvey  Species Count
1   AX  A   F   Dry B   0
2   AY  A   V   Wet A   5
3   BX  B   F   Wet B   1
4   BY  B   V   Dry B   0

Each data point shows one collection unit in a certain Village, District, Site, 
and Survey for a certain Species. 'Count' is the number of animals collected in 
that collection unit. It is possible that zero animals are collected in that 
unit because of very low densities, but also because of climatic conditions 
(wind, rain, etc), so we would expect an excess in zeroes. I have tested that 
the data are overdispersed (variance much bigger than mean), so a zero-inflated 
negative binomial model seems the most suitable model in this case. To be sure, 
I will compare the zero-inflated model to the standard binomial model using the 
vuong test. The models will be made for each species separately. For these 
models I can use the glm.nb(), and the and zeroinfl () in the package pscl, 
looking something like this (after selection of the subset B-subset(data, 
Species==B)): 
NB=glm.nb(formula = Count ~ District+Site+Survey, data = B)
ZINB=zeroinfl(formula = Count ~ District+Site+Survey, dist=negbin, data = B)
Vuong(NB,ZINB)
I have tried this and it works very elegantly.

However, the animal collections were only done in 4 districts, and in each 
district 3 villages were chosen (a total of 12 villages). This should be 
included in the design. The package survey allows this for the standard 
negative binomial model, but it seems to me that it is not possible for the 
zero-inflated NB. So, my question is two-fold: 
1. Is a zero-inflated NB possible in the survey package. If yes, how? 
2. If no, how can I build a zero-inflated NB model that takes into account the 
clustering of the observations (animal counts) in villages and the clustering 
of the villages in districts. 

Thank you very much for the help.
ITM Colloquium

Antwerp, Belgium
3-5 December 2012

www.itg.be/colloq2012

Disclaimer: Http://www.itg.be/disclaimer

Directions to our location(s): http://g.co/maps/ua89b

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed for efficient way to loop through rows and columns

2012-05-16 Thread Rui Barradas

Hello,

Your data.frame is composed exclusively of factors, but try this

(I've changed the name to 'sampl', because 'sample' is an R function.)


# logical index vectors
iA - sampl$AorB == A
iB - sampl$AorB == B

new.sampl - data.frame(
apply(sampl, 2, function(x){
iAA - x == AA
iBB - x == BB
x[ iA  iAA ] - 2
x[ iA  iBB ] - 0
#
x[ iB  iAA ] - 0
x[ iB  iBB ] - 2
#
x[ x %in% c(AB, BA) ] - 1
x}
))

Hope this helps,

Rui Barradas

Priya Bhatt wrote
 
 Dear R-helpers:
 
 I am trying to write a script that iterates through a dataframe that looks
 like this:
 
 
 Example dataset called sample:
 
 names - c(S1, S2, S3, S4)
 X - c(BB, AB, AB, AA)
 Y - c(BB, BB, AB, AA)
 Z - c(BB, BB, AB, NA)
 AorB - c(A, A, A, B)
 
 sample - data.frame(names, X, Y, Z, AorB)
 
 
 for a given row,
 
 if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0
 
 if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2
 
 I've been trying  to write this using apply and ifelse statements in hopes
 that my code runs quickly, but I'm afraid I've make a big mess.  See
 below:
 
 apply(sample, 1, function(i) {
 
 
   ifelse(sample$AorB[i] == A,
  (ifelse(sample[i,] == AA, sample[i,] - 2 ,
  ifelse(sample[i,] == AB || sample[i,] == BA ,
 sample[i,] - 1,
 ifelse(sample[i,] == BB, sample[i,] - 0,
 sample[i,] - NA )) )
   )   , ifelse(sample$AorB[i,] == B),
  (ifelse(sample[i,] == AA, sample[i,] - 0 ,
  ifelse(sample[i,] == AB || sample[i,] == BA ,
 sample[i,] - 1,
 ifelse(sample[i,] == BB, sample[i,] - 2,
 sample[i,] - NA) })
 
 
 Any Advice?
 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


--
View this message in context: 
http://r.789695.n4.nabble.com/Help-needed-for-efficient-way-to-loop-through-rows-and-columns-tp4630226p4630248.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread R. Michael Weylandt

On Wed, May 16, 2012 at 9:52 AM, aramos ara...@fep.up.pt wrote:
 Hi!

 Any one knows how to obtain critical values for the k-s statistic, using R?


Take a look at ?ks.test and the code of ks.test to see how R does it.
OSS is super helpful for these sorts of things.

Michael

 Thanks,
 Alex

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] install ggplot2 package

2012-05-16 Thread Yang, Ming

Has one try to install the ggplot2 package recently? I tried to install
it on my new system and had trouble:

 

 install.packages(ggplot2)

Installing package(s) into 'C:/Program Files/R/R-2.14.2/library'

(as 'lib' is unspecified)

also installing the dependency 'scales'

 

trying URL
'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

Warning in install.packages :

  cannot open: HTTP status was '404 Not Found'

Error in download.file(url, destfile, method, mode = wb, ...) : 

  cannot open URL
'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

Warning in install.packages :

  download of package 'scales' failed

trying URL
'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

Warning in install.packages :

  cannot open: HTTP status was '404 Not Found'

Error in download.file(url, destfile, method, mode = wb, ...) : 

  cannot open URL
'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

Warning in install.packages :

  download of package 'ggplot2' failed

 

Thanks

Ming

 

  Ming Yang, PhD

 

  Xerox Research Center Webster

 

  800 Phillips Rd (MS:0147-11B); Webster, NY, 14580

 

  Ph: (585) 422-2375 Fx: (585) 231-8404

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] double buffering in windows() not working

2012-05-16 Thread Uwe Ligges

Fine for me, and I cannot investigate anything since there is not even a 
single piece of reproducible code given.


Uwe Ligges


On 15.05.2012 23:20, Daniel Carr wrote:

I have doubled buffered animations that I show in class.
They used to work but now flash.

The default windows() option is buffered = TRUE.
Just in case, I tried using windows( buffered = TRUE)
but this made no difference.

I am not sure when the change occurred.
An older R2.11 version in one class room worked.
R2.14.1, R2.14.2 and R2.15 don't work on my computer.

Some of the animations add to the plot, for example
using points and segments. I thought that might
be triggering the buffer swap, but just drawing
filled circles causes flashing.

I am using XP and the 32 bit version. I think I tried it
with Windows7 and still had a problem.

My RSeek search did not turn up anything recent
problem related to buffering.

Thanks in advance for help provided

Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rpart - predict terminal nodes for new observations

2012-05-16 Thread Uwe Ligges




On 15.05.2012 16:30, tudor wrote:

Dear useRs:

Is there a way I could predict the terminal node associated with a new data
entry in an rpart environment? In the example below, if I had a new data
entry with an AM of 5, I would like to link it to the terminal node 2. My
searches led to http://tolstoy.newcastle.edu.au/R/e4/help/08/07/17702.html
but I do not seem to be able to operationalize Professor Ripley's
suggestions.



Use the predict() function.

Uwe Ligges



Many thanks.

Tudor


tree.prune

n= 2400

node), split, n, loss, yval, (yprob)
   * denotes terminal node

1) root 2400 779 0 (0.6754167 0.3245833)
   2) AM  6.5 1428 254 0 (0.8221289 0.1778711) *
   3) AM=6.5 972 447 1 (0.4598765 0.5401235)
 6) P=10.39666 390  86 0 (0.7794872 0.2205128) *
 7) P  10.39666 582 143 1 (0.2457045 0.7542955) *


--
View this message in context: 
http://r.789695.n4.nabble.com/rpart-predict-terminal-nodes-for-new-observations-tp4630104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merging multiple data sets

2012-05-16 Thread Uwe Ligges




On 16.05.2012 15:51, Bharat Warule wrote:

Hello R user,

I have four data sets in dir D:/Bharat Warule/Rdata_file which are
output_data_prod_1.rda, output_data_prod_2.rda, output_data_prod_3.rda,
output_data_prod_4.rda.
Each data set is huge size like number of rows 343297 and columns are near
to 50.

For example:

x1- data.frame(x11=c(1,2,3,4,5),x112=c(10,10,10,10,10))
x2- data.frame(x11=c(1,2,3,4,5),x122=c(20,20,20,20,20))
x3- data.frame(x11=c(1,2,3,4,5),x132=c(30,30,30,30,30))
x4- data.frame(x11=c(1,2,3,4,5),x142=c(40,40,40,40,40))
x5- data.frame(x11=c(1,2,3,4,5),x152=c(50,50,50,50,50))

for(i in 1:5){
name- paste('x',i,sep='')
name1- paste(name,rda,sep='.')
save(name, file = name1)


To fix this part, use:

save(list = name, file = name1)



}


I want merge this data sets into one data set but I don’t know where I am
going wrong?

Please help me. Thanks for your help.

subsetname-  x1
file_no-   4
output_data_prod-   data.frame()

for(n in 1:file_no){
  myfile- gsub(( ), , paste(subsetname , _, n,.rda))


To match the above:

 myfile- gsub(( ), , paste(subsetname , n, .rda, sep=))


  temp_data- load(file = myfile)
  data_22- get(temp_data)

  if(dim(output_data_prod)[1]==0){output_data_prod- data_22
   }else{
   output_data_prod- merge(inData1 = output_data_prod,


Nonsense, the arguments of merge are called x and y rather than inData1 
and inData2.


Uwe Ligges


inData2 = data_22 ,type = inner, all=FALSE ,
by =c(x11))}

}


-
Bharat Warule
Cypress Analytica ,
Pune
--
View this message in context: 
http://r.789695.n4.nabble.com/Merging-multiple-data-sets-tp4630244.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread Uwe Ligges


On 16.05.2012 15:52, aramos wrote:

Hi!

Any one knows how to obtain critical values for the k-s statistic, using R?


 ks.test(.)$statistic

Uwe ligges



Thanks,
Alex

--
View this message in context: 
http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] install ggplot2 package

2012-05-16 Thread R. Michael Weylandt

It looks like there might be a mirror problem -- use
chooseCRANmirror() to select a different mirror.

Best,
Michael

On Wed, May 16, 2012 at 10:21 AM, Yang, Ming ming.y...@xerox.com wrote:
 Has one try to install the ggplot2 package recently? I tried to install
 it on my new system and had trouble:



 install.packages(ggplot2)

 Installing package(s) into 'C:/Program Files/R/R-2.14.2/library'

 (as 'lib' is unspecified)

 also installing the dependency 'scales'



 trying URL
 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

 Warning in install.packages :

  cannot open: HTTP status was '404 Not Found'

 Error in download.file(url, destfile, method, mode = wb, ...) :

  cannot open URL
 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

 Warning in install.packages :

  download of package 'scales' failed

 trying URL
 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

 Warning in install.packages :

  cannot open: HTTP status was '404 Not Found'

 Error in download.file(url, destfile, method, mode = wb, ...) :

  cannot open URL
 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

 Warning in install.packages :

  download of package 'ggplot2' failed



 Thanks

 Ming



  Ming Yang, PhD



  Xerox Research Center Webster



  800 Phillips Rd (MS:0147-11B); Webster, NY, 14580



  Ph: (585) 422-2375     Fx: (585) 231-8404






        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] install ggplot2 package

2012-05-16 Thread Uwe Ligges

Looks like your mirror was in an inconstant state. Seems to be fixed by 
a finished rysnc in the meantime ...


Uwe ligges






On 16.05.2012 16:21, Yang, Ming wrote:

Has one try to install the ggplot2 package recently? I tried to install
it on my new system and had trouble:




install.packages(ggplot2)


Installing package(s) into 'C:/Program Files/R/R-2.14.2/library'

(as 'lib' is unspecified)

also installing the dependency 'scales'



trying URL
'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

Warning in install.packages :

   cannot open: HTTP status was '404 Not Found'

Error in download.file(url, destfile, method, mode = wb, ...) :

   cannot open URL
'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

Warning in install.packages :

   download of package 'scales' failed

trying URL
'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

Warning in install.packages :

   cannot open: HTTP status was '404 Not Found'

Error in download.file(url, destfile, method, mode = wb, ...) :

   cannot open URL
'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

Warning in install.packages :

   download of package 'ggplot2' failed



Thanks

Ming



   Ming Yang, PhD



   Xerox Research Center Webster



   800 Phillips Rd (MS:0147-11B); Webster, NY, 14580



   Ph: (585) 422-2375 Fx: (585) 231-8404






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] install ggplot2 package

2012-05-16 Thread Ista Zahn

Hi Yang,

Did you try a different CRAN mirror?

Best,
Ista

On Wed, May 16, 2012 at 10:21 AM, Yang, Ming ming.y...@xerox.com wrote:
 Has one try to install the ggplot2 package recently? I tried to install
 it on my new system and had trouble:



 install.packages(ggplot2)

 Installing package(s) into 'C:/Program Files/R/R-2.14.2/library'

 (as 'lib' is unspecified)

 also installing the dependency 'scales'



 trying URL
 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

 Warning in install.packages :

  cannot open: HTTP status was '404 Not Found'

 Error in download.file(url, destfile, method, mode = wb, ...) :

  cannot open URL
 'http://cran.case.edu/bin/windows/contrib/2.14/scales_0.2.0.zip'

 Warning in install.packages :

  download of package 'scales' failed

 trying URL
 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

 Warning in install.packages :

  cannot open: HTTP status was '404 Not Found'

 Error in download.file(url, destfile, method, mode = wb, ...) :

  cannot open URL
 'http://cran.case.edu/bin/windows/contrib/2.14/ggplot2_0.9.0.zip'

 Warning in install.packages :

  download of package 'ggplot2' failed



 Thanks

 Ming



  Ming Yang, PhD



  Xerox Research Center Webster



  800 Phillips Rd (MS:0147-11B); Webster, NY, 14580



  Ph: (585) 422-2375     Fx: (585) 231-8404






        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] simple data.frame question

2012-05-16 Thread Troels Ring

Dear friends - I hope you will forgive me another simple question, 
illustrated by


ID - c(1,1,1,2,2,3,3,3)
PERIOD - c(1,2,3,2,3,1,2,3)
X - runif(8,0,10))

FF - data.frame(ID=ID,PERIOD=PERIOD,X=X)

I need to the fourth value of X as NA, and ID and PERIOD is updated to 
1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively.

How do I use the pattern in ID and PERIOD to find the lacking X and put NA?

Best wishes

Troels Ring,
Aalborg, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding mean and SD for a log-normal distribution

2012-05-16 Thread David Winsemius



On May 16, 2012, at 6:37 AM, Andras Farkas wrote:


Dear R Expert

allow me to ask a quick qestion: I have a mean value of 6 and a SD  
of 3 describing my distribution. I would like to convert this  
distribution into a log normal distribution that would best describe  
it when resimulated using log normal distribution. Currently I am  
using another software to estimate the respective mean and SD on the  
log scale and the results are: 1.6667 and SD 0.47071. Then, to best  
reproduce my original distribution in R, I use the following commands:


c - rlnorm(5000,1.6667,0.47071)
d - exp(c)
mean(c)
sd(c)


I get a better match to those values with:
distrib - rlnorm(50,1.682,0.47071)

(Bad practice to use 'c' as an object name.)



and the results for mean and SD are 5.92 and 2.94 (original 6 and  
3), respectively, which I am reasonably happy with. I would like to  
grow independent of the another software I use, but am unable to  
figure out how to generate the values of 1.6667 and 0.47071 using R.  
could someone please help me with this question?


You need to review your resources on statistical distributions. The  
Wikipedia article has the needed transformations for parameters  
between the log and untransformed scales under the section entitled  
Arithmetic moments.


So that was the basis for this test:

# mu for LN
 log(6) - 0.5*log(1+9/6^2)
[1] 1.680188
# sigma for LN
 sqrt( log( 1 +9/6^2))
[1] 0.4723807
 c - rlnorm(50,1.680188,0.4723807)
 d - exp(c)
# Expected value
 mean(c)
[1] 5.99303
# SD
 sd(c)
[1] 2.996532

So my half-assed approximation was in better agreement with theory  
than your other software. On the other hand you haven't really given  
us much background for this estimation process so its not possible to  
offer a solid value judgment. R has package that do distribution  
fitting, MASS has fitdistr and there is a fitdistrplus package   
and others I believe. There's a monograph out about R's facilities but  
at the moment I cannot put my hands on my copy. There is a  
Distributions TaskView:

http://cran.r-project.org/web/views/Distributions.html

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replacing with NA

2012-05-16 Thread Mintewab Bezabih

Dear R users, 

I was wondering  how I can replace the values of a vector with the values from 
in another vector in the same row

For example, how can I replace the value of x below with NA when the value of Z 
in the same row is NA?
x -1:20
z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,17, NA, 12, 10)


Many thanks
Mintewab


Från: Mintewab Bezabih
Skickat: den 15 maj 2012 15:53
Till: r-help@r-project.org
Kopia: r-help@r-project.org
Ämne: missing observations

Dear R users,

I have missing observations in my data that I remove in my analysis. I am able 
to run my codes alright but I want the non missing values to be correctly 
identified and therefore want to tag my id vector along in my results. Since 
the vector of ids has no role in the analysis, I dont know how to include it.



Here is my reprducable example:and my id is the vector I want to add to the 
analysis somehow so that my missing values are identified. I cannot use  
na.action function and that is why I have to drop my missing obesevations 
beforehand.


library(fields)
x -1:20
y- runif(20)
z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,
17, NA, 12, 10)
id -1:20

mydataset-data.frame(x, y, z)
temperature[complete.cases(mydataset),]

 x- temperature[, c(1)]
y- temperature[, c(2)]
z- temperature[, c(3)]

tpsfit - Tps(cbind(x, y), z, scale.type=unscaled)




Many thanks as always.
Regards,
Mintewab

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replacing with NA

2012-05-16 Thread Mintewab Bezabih


Dear R users, 

I was wondering  how I can replace the values of a vector with the values from 
in another vector in the same row

For example, how can I replace the value of x below with NA when the value of Z 
in the same row is NA?
x -1:20
z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,17, NA, 12, 10)


Many thanks
Mintewab
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing with NA

2012-05-16 Thread R. Michael Weylandt

x[is.na(z)] - NA

This might send you a nasty bug if x and z are different lengths
though -- just a head's up.

Michael

On Wed, May 16, 2012 at 12:55 PM, Mintewab Bezabih
mintewab.beza...@economics.gu.se wrote:
 Dear R users,

 I was wondering  how I can replace the values of a vector with the values 
 from in another vector in the same row

 For example, how can I replace the value of x below with NA when the value of 
 Z in the same row is NA?
 x -1:20
 z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,17, NA, 12, 
 10)


 Many thanks
 Mintewab

 
 Från: Mintewab Bezabih
 Skickat: den 15 maj 2012 15:53
 Till: r-help@r-project.org
 Kopia: r-help@r-project.org
 Ämne: missing observations

 Dear R users,

 I have missing observations in my data that I remove in my analysis. I am 
 able to run my codes alright but I want the non missing values to be 
 correctly identified and therefore want to tag my id vector along in my 
 results. Since the vector of ids has no role in the analysis, I dont know how 
 to include it.



 Here is my reprducable example:and my id is the vector I want to add to the 
 analysis somehow so that my missing values are identified. I cannot use  
 na.action function and that is why I have to drop my missing obesevations 
 beforehand.


 library(fields)
 x -1:20
 y- runif(20)
 z- c(11, 15, 17, 2, 18, 6, 7, NA, 12, 10,21, 25, 27, 12, 28, 16,
 17, NA, 12, 10)
 id -1:20

 mydataset-data.frame(x, y, z)
 temperature[complete.cases(mydataset),]

  x- temperature[, c(1)]
 y- temperature[, c(2)]
 z- temperature[, c(3)]

 tpsfit - Tps(cbind(x, y), z, scale.type=unscaled)




 Many thanks as always.
 Regards,
 Mintewab

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals for nls or nls2 model

2012-05-16 Thread Walmes Zeviani

If you want a confidence based in new x values you can do. I have this post
with steps to do this. It's written in Portuguese but the R code is useful.

http://ridiculas.wordpress.com/2011/05/19/bandas-de-confianca-para-modelo-de-regressao-nao-linear/

Bests.
Walmes.

==
Walmes Marques Zeviani
LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W)
Departamento de Estatística - Universidade Federal do Paraná
fone: (+55) 41 3361 3573
VoIP: (3361 3600) 1053 1173
e-mail: wal...@ufpr.br
twitter: @walmeszeviani
homepage: http://www.leg.ufpr.br/~walmes
linux user number: 531218
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how disable the Error massage in read.table() no lines available in input

2012-05-16 Thread gianni lavaredo

Dear Researchers,

I am looking a way to disable the Error massage in read.table() as warn =
TRUE in readLines(), when the lines are empty

Error in read.table(con, header = F, sep =  , nrow = n) :
  no lines available in input

thanks for all suggestions
Gianni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple data.frame question

2012-05-16 Thread David Winsemius



On May 16, 2012, at 11:56 AM, Troels Ring wrote:

Dear friends - I hope you will forgive me another simple question,  
illustrated by


ID - c(1,1,1,2,2,3,3,3)
PERIOD - c(1,2,3,2,3,1,2,3)
X - runif(8,0,10))


Extraneous paren removed:



FF - data.frame(ID=ID,PERIOD=PERIOD,X=X)

I need to the fourth value of X as NA, and ID and PERIOD is updated  
to 1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively.
How do I use the pattern in ID and PERIOD to find the lacking X and  
put NA?


 ffnew=merge(x=expand.grid(1:3,1:3),
+ y=FF, by =1:2, all.x=TRUE)
 ffnew
  Var1 Var2 X
111 6.6294571
212 0.5749111
313 8.7895630
421NA
522 5.7213062
623 6.1030507
731 8.9182841
832 4.2823937
933 8.8249263


Best wishes

Troels Ring,
Aalborg, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple data.frame question

2012-05-16 Thread Troels Ring


Thanks a lot - beautiful
Troels

Den 16-05-2012 19:29, David Winsemius skrev:


On May 16, 2012, at 11:56 AM, Troels Ring wrote:

Dear friends - I hope you will forgive me another simple question, 
illustrated by


ID - c(1,1,1,2,2,3,3,3)
PERIOD - c(1,2,3,2,3,1,2,3)
X - runif(8,0,10))


Extraneous paren removed:



FF - data.frame(ID=ID,PERIOD=PERIOD,X=X)

I need to the fourth value of X as NA, and ID and PERIOD is updated 
to 1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively.
How do I use the pattern in ID and PERIOD to find the lacking X and 
put NA?


 ffnew=merge(x=expand.grid(1:3,1:3),
+ y=FF, by =1:2, all.x=TRUE)
 ffnew
  Var1 Var2 X
111 6.6294571
212 0.5749111
313 8.7895630
421NA
522 5.7213062
623 6.1030507
731 8.9182841
832 4.2823937
933 8.8249263


Best wishes

Troels Ring,
Aalborg, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Finding words that are within +/- X words of KRAS using tm package or other means

2012-05-16 Thread Paul Miller

Hello All,

This will probably be easy for some but isn't for me. Currently am working on a 
text mining exercise. Want to be able to predict whether cancer patients got 
KRAS testing, and, if so, whether the test yielded a result of wild 
type/negative or mutant/positive. I've begun with a bag-of-words approach 
that looks at the count of specific terms in the medical records and then uses 
some of those as predictors. 

This works great for predicting whether or not patients got tested. It's not so 
good though when it comes to predicting the outcome of testing. Trouble is that 
patients can have a reference to KRAS testing and also have a lot of references 
to, say, positive where that term has nothing to do with the result of their 
KRAS testing. 

So I'd like to be able to identify the number of instances in a patient's 
medical record where relevant terms like wild type, negative, mutant, or 
positive come either shortly before or shortly after KRAS. It would be 
great if there is a way to do this within the tm package. I've found that very 
helpful for preparing my data thus far.

If not though, I have a data frame that contains patient number in one column 
and the patient's complete text medical record in another. So some sort of 
regular expression likely would work just fine. 

Here are some examples of the sort of thing I'm looking to count:

Received KRAS testing results on xx/xx/. Test results indicate the 
presence of a mutation.

Tumor is KRAS negative

KRAS (mutated) 

Tumor is positive for KRAS mutation 

And here's an example of something I want to ignore.

Will conduct KRAS testing prior to initiation of therapy. ... (Several lines 
of material) ... Bilirubin positive.

A couple of things stand out here. The first is that I need to be able to pick 
up on variations of the relevant terms. So, for example, that means being able 
to identify that either mutant or mutated came in close proximity to 
KRAS. 

The other thing is that while increasing the number of words to look forward 
and backward will identify more valid cases, it will also tend to identify more 
invalid ones as well. For example, looking as many as 12 words after KRAS will 
lead to correct identification of:

Received KRAS testing results on xx/xx/. Test results indicate the 
presence of a mutation.

but also incorrect identification of:

Will conduct KRAS testing prior to initiation of therapy. Note that patient 
was positive for Lynch mutation.

Thinking I will need to to keep the window short in order to obtain the best 
results. Would be nice if I could easily increase or decrease the number of 
words to look forward and backward though. Would also be good if I could, say, 
select a relatively small number of terms to look forward and a larger number 
of words to look forward.

Having gotten to the end of this description it occurs to me this is actually 
harder than I thought.

If one of you gurus could help me out, that would be greatly appreciated.

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] transfer R objects back to console/command line

2012-05-16 Thread Jannis


Dear R community,


is there any way to invoke R in batch mode, do some calculations and get 
the values of some R variables back into the (bash)shell ? I only 
managed to get some output saved into a text file with:


 R --slave --args 2 2 test.Rtest2.R

test.R contains:

a - as.numeric(commandArgs()[4])
b - as.numeric(commandArgs()[5])

c=a*b


Is ther any way to acess the contents of c in the command line after 
running R ?



Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] transfer R objects back to console/command line

2012-05-16 Thread R. Michael Weylandt

Take a look at this SO question:
http://stackoverflow.com/questions/10575005/output-a-boolean-from-an-rscript-into-a-bash-variable

None of the solutions are Boolean specific so you should be good
with them (the key is printing and capturing)

Michael

On Wed, May 16, 2012 at 2:36 PM, Jannis bt_jan...@yahoo.de wrote:
 Dear R community,


 is there any way to invoke R in batch mode, do some calculations and get the
 values of some R variables back into the (bash)shell ? I only managed to get
 some output saved into a text file with:

  R --slave --args 2 2 test.Rtest2.R

 test.R contains:

 a - as.numeric(commandArgs()[4])
 b - as.numeric(commandArgs()[5])

 c=a*b


 Is ther any way to acess the contents of c in the command line after running
 R ?


 Cheers
 Jannis

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread Petr Savicky

On Wed, May 16, 2012 at 06:52:48AM -0700, aramos wrote:
 Hi!
 
 Any one knows how to obtain critical values for the k-s statistic, using R?

Hi.

I do not know, whether there is a function for this. However, the following
randomized approach allows to extract a table of statistic/p.value pairs
from ks.test() for fixed sample sizes.

  n1 - 30
  n2 - 50
  d - 1
  res - matrix(nrow=d, ncol=2)
  for (i in seq.int(length=d)) {
  x1 - runif(n1) + runif(1)
  x2 - runif(n2) + runif(1)
  out - ks.test(x1, x2)
  res[i, 1] - out$statistic
  res[i, 2] - out$p.value
  }
  tab - unique(res[order(res[, 1]), ])
  colnames(tab) - c(statistic, p.val)

If you are mainly interested in the range of the p-values for relatively
close distributions, then replace

  x1 - runif(n1) + runif(1)
  x2 - runif(n2) + runif(1)

by

  x1 - runif(n1)
  x2 - runif(n2)

Part of the obtained table is

 statisticp.val
  
  [39,] 0.3000 5.642910e-02
  [40,] 0.3067 4.815638e-02
  [41,] 0.3133 4.091424e-02
  [42,] 0.3200 3.466530e-02
  [43,] 0.3267 2.925540e-02
  [44,] 0. 2.458672e-02
  [45,] 0.3400 2.060188e-02
  [46,] 0.3467 1.719140e-02
  [47,] 0.3533 1.428992e-02
  [48,] 0.3600 1.183727e-02
  [49,] 0.3667 9.767969e-03

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scraping a web page.

2012-05-16 Thread Keith Weintraub

Thanks Gabor,
  Nifty regexp. I never used strapplyc before and I am sure this will become a 
nice addition to my toolkit.

KW

Message: 5
Date: Tue, 15 May 2012 07:55:33 -0400
From: Gabor Grothendieck ggrothendi...@gmail.com
To: Keith Weintraub kw1...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Scraping a web page.
Message-ID:
CAP01uR=zdxHocxpsZdpT+4Kx2=L2vr9jnr=i=_Qhs39O=qo...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
On Tue, May 15, 2012 at 7:06 AM, Keith Weintraub kw1...@gmail.com wrote:

 Thanks,
 ?That was very helpful.

 I am using readLines and grep. If grep isn't powerful enough I might end up 
 using the XML package but I hope that won't be necessary.

This only uses readLines and strapplyc (from gsubfn).  It scrape the
relevant strings from your post on nabble and by modifying URL and pat
you can likely get it to work with whatever the format of your
original files is:
library(gsubfn)
URL - http://r.789695.n4.nabble.com/Scraping-a-web-page-tp4630005.html;
L - readLines(URL)
pat - 'br/quot;/en/Ships.*-(\\d{7}).htmlquot;'
strapplyc(L, pat, simplify = c)
The result from the last line is:
[1] 8605507 8122830
-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Optimization problem

2012-05-16 Thread Pacin Al

Hi,

I'm dealing with an optimization problem. I'm using 'optim' to maximize the
output of a function, given some restrictions on the input. I would like to
know if there is a way to impose some restrictions on 'intermediate
variables' of the function. An example..

fx = function (x)
{
s - 0
for (i in 1:3)
{
s - x[i]^3 + s
}
s
}

optim(rep(4,3), method=L-BFGS-B, lower=rep(-10,nlin), upper=rep(10,nlin))

It would return '-10' for all variables. I want, however, a solution
satisfying mean(x)7.
Please, don't analyse this specific example, but the logic of satisfying a
criterium for the mean of the input (with thousands of variables). My real
problem involves price elasticity and I want to find the price increase for
each individual that would give me maximum total profit margin, but
respecting a minimum retention of clients.

Thank you very much,
John Mayer 

--
View this message in context: 
http://r.789695.n4.nabble.com/Optimization-problem-tp4630278.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Dominik Bruhn

Thanks Max for your answer.

First, I do not understand your post. Why is it a problem if two of
predictions match? From the formula for calculating R^2 I can see that
there will be a DivByZero iff the total sum of squares is 0. This is
only true if the predictions of all the predicted points from the
test-set are equal to the mean of the test-set. Why should this happen?

Anyway, I wrote the following code to check what you tried to tell:

--
library(caret)
data(trees)
formula=Volume~Girth+Height

customSummary - function (data, lev = NULL, model = NULL) {
print(summary(data$pred))
return(defaultSummary(data, lev, model))
}

tc=trainControl(method='cv', summaryFunction=customSummary)
train(formula, data=trees,  method='rpart', trControl=tc)
--

This outputs:
---
  Min. 1st Qu.  MedianMean 3rd Qu.Max.
  18.45   18.45   18.45   30.12   35.95   53.44
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  22.69   22.69   22.69   32.94   38.06   53.44
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
  30.37   30.37   30.37   30.37   30.37   30.37
[cut many values like this]
Warning: In nominalTrainWorkflow(dat = trainData, info = trainInfo,
method = method,  :
  There were missing values in resampled performance measures.
-

As I didn't understand your post, I don't know if this confirms your
assumption.

Thanks anyway,
Dominik


On 16/05/12 17:30, Max Kuhn wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).
 
 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.
 
 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.
 
 Max
 
 On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).

 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.

 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.

 Max

 On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote:
 Hy,
 I got the following problem when trying to build a rpart model and using
 everything but LOOCV. Originally, I wanted to used k-fold partitioning,
 but every partitioning except LOOCV throws the following warning:

 
 Warning message: In nominalTrainWorkflow(dat = trainData, info =
 trainInfo, method = method, : There were missing values in resampled
 performance measures.
 -

 Below are some simplified testcases which repoduce the warning on my
 system.

 Question: What does this error mean? How can I avoid it?

 System-Information:
 -
 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] rpart_3.1-52   caret_5.15-023 foreach_1.4.0  cluster_1.14.2
 reshape_0.8.4
 [6] plyr_1.7.1 lattice_0.20-6

 loaded via a namespace (and not attached):
 [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6
 [5] tools_2.15.0
 ---


 Simlified Testcase I: Throws warning
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 train(formula, data=trees,  method='rpart')
 ---

 Simlified Testcase II: Every other CV-method also throws the warning,
 for example using 'cv':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='cv')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---

 Simlified Testcase III: The only CV-method which is working is 'LOOCV':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='LOOCV')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---


 Thanks!
 --
 Dominik Bruhn
 mailto: domi...@dbruhn.de




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Max
 
 
 


-- 
Dominik Bruhn
mailto: domi...@dbruhn.de



signature.asc
Description: OpenPGP digital signature
__

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Dominik Bruhn

Sorry for the follow-up, but I dig deeper into the problem.

My text on the R^2 was wrong: In my opinion, and at least to Wikipedia,
R^2 yields a division by zero iff SStot (the total sum of squares) is
zero. SStot is the sum of the sum of the difference between the observed
(not the predicted) values and the mean of the observed values. As this
value is not dependeant on the the predicted/modelled values, the
occurrence of a DivByZero can not dependent on the model but only on the
data itself. In short to get a SStot=0 (and therefor a DivByZero), you
would need a training-dataset where every value equals the mean of the
training-set, therefor a constant dataset. My input and also my
trainingset is far from beeing constant, so where is the error?

Thanks again!
Dominik


On 16/05/12 17:30, Max Kuhn wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).
 
 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.
 
 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.
 
 Max
 
 On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).

 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.

 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.

 Max

 On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote:
 Hy,
 I got the following problem when trying to build a rpart model and using
 everything but LOOCV. Originally, I wanted to used k-fold partitioning,
 but every partitioning except LOOCV throws the following warning:

 
 Warning message: In nominalTrainWorkflow(dat = trainData, info =
 trainInfo, method = method, : There were missing values in resampled
 performance measures.
 -

 Below are some simplified testcases which repoduce the warning on my
 system.

 Question: What does this error mean? How can I avoid it?

 System-Information:
 -
 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] rpart_3.1-52   caret_5.15-023 foreach_1.4.0  cluster_1.14.2
 reshape_0.8.4
 [6] plyr_1.7.1 lattice_0.20-6

 loaded via a namespace (and not attached):
 [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6
 [5] tools_2.15.0
 ---


 Simlified Testcase I: Throws warning
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 train(formula, data=trees,  method='rpart')
 ---

 Simlified Testcase II: Every other CV-method also throws the warning,
 for example using 'cv':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='cv')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---

 Simlified Testcase III: The only CV-method which is working is 'LOOCV':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='LOOCV')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---


 Thanks!
 --
 Dominik Bruhn
 mailto: domi...@dbruhn.de




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Max
 
 
 


-- 
Dominik Bruhn
mailto: domi...@dbruhn.de



signature.asc
Description: OpenPGP digital signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Job posting - Statistical Consultant - Univ. of Texas at Austin

2012-05-16 Thread Mahometa, Michael J

All,

Just to get the word out: We are looking for a new Statistical Consultant at
the Division of Statistics and Scientific Computation here at the University
of Texas at Austin. Please pass along to any colleagues who might be
interested...

http://ssc.utexas.edu/people/employment

Thanks, Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread aramos

I think that command will give me the statistics observed value!! Not
quantiles from the k-s distribution!

--
View this message in context: 
http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fitting t copula with fixed dof

2012-05-16 Thread ergen



I need to fit a t copula with fixed degree of freedom let's say 4. I  
do not want to estimate the dof together with correlation matrix  
optimally. Instead fix the dof to 4 and only estimate the correlation  
matrix in the optimization routine. Is anyone aware of such estimation  
method in R.


The packages and functions that I know of can't do this estimation. I  
searched online but couldn't find anything. I will appreciate any  
help/comments.


Best Regards
Ibrahim Ergen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread aramos

Thanks, I've already done that!!
What is OSS?


--
View this message in context: 
http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630276.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help needed for efficient way to loop through rows and columns

2012-05-16 Thread Priya Bhatt

Yes here it is.  I actually convert them all as strings, initially using
options(stringsAsFactors=F) at the top of my code.

This what the initial dataframe looks like.  Please note this is a toy
dataset:

namesXYZAorB
S1BBBBBBA
S2AABBBBA
S3ABABAAB
S4AAAANAB


And the code to create this initial dataframe is:

names - c(S1, S2, S3, S4)
X - c(BB, AA, AB, AA)
Y - c(BB, BB, AB, AA)
Z - c(BB, BB, AA, NA)
AorB - c(A, A, B, B)

sample - data.frame(names, X, Y, Z, AorB)


The final data.frame should look like:

names  XYZAorB
S1000A
S2200A
S3110B
S400NA B

You're right! - I'll should be able to globally change all ABs and BAs to
1s. Thanks:)  I'm not exactly sure how to change AA and BB depending on
AorB for each row though.  Thoughts?

Thanks for your help thus far, David.

Best, Priya


On Wed, May 16, 2012 at 6:53 AM, David L Carlson dcarl...@tamu.edu wrote:

 Can you show us what you want the final data.frame to look like? You've
 created five variables stored as factors and you seem to be trying to
 change
 those to numeric values? Is that correct?

 Since AB and BA are always set to 1, you could just replace those values
 globally rather than mess with the ifelse commands for those values. Only
 AA
 and BB are affected by the value of AorB.

 Your apply() function processes the data.frame by row so i is a vector
 consisting of all the values in the row. You seem to be coding as if i was
 a
 single integer (as in a for loop).

 --
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77843-4352


  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Priya Bhatt
  Sent: Wednesday, May 16, 2012 3:08 AM
  To: r-help@r-project.org
  Subject: [R] Help needed for efficient way to loop through rows and
  columns
 
  Dear R-helpers:
 
  I am trying to write a script that iterates through a dataframe that
  looks
  like this:
 
 
  Example dataset called sample:
 
  names - c(S1, S2, S3, S4)
  X - c(BB, AB, AB, AA)
  Y - c(BB, BB, AB, AA)
  Z - c(BB, BB, AB, NA)
  AorB - c(A, A, A, B)
 
  sample - data.frame(names, X, Y, Z, AorB)
 
 
  for a given row,
 
  if AorB == A, then AA == 2, AB = 1, BA = 1, BB = 0
 
  if AorB == B, then AA == 0, AB = 1, BA = 1, BB = 2
 
  I've been trying  to write this using apply and ifelse statements in
  hopes
  that my code runs quickly, but I'm afraid I've make a big mess.  See
  below:
 
  apply(sample, 1, function(i) {
 
 
ifelse(sample$AorB[i] == A,
   (ifelse(sample[i,] == AA, sample[i,] - 2 ,
   ifelse(sample[i,] == AB || sample[i,] == BA ,
  sample[i,] - 1,
  ifelse(sample[i,] == BB, sample[i,] - 0,
  sample[i,] - NA )) )
)   , ifelse(sample$AorB[i,] == B),
   (ifelse(sample[i,] == AA, sample[i,] - 0 ,
   ifelse(sample[i,] == AB || sample[i,] == BA ,
  sample[i,] - 1,
  ifelse(sample[i,] == BB, sample[i,] - 2,
  sample[i,] - NA) })
 
 
  Any Advice?
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread R. Michael Weylandt

Open source software (what you're driving)

Michael

On Wed, May 16, 2012 at 12:27 PM, aramos ara...@fep.up.pt wrote:
 Thanks, I've already done that!!
 What is OSS?


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630276.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple data.frame question

2012-05-16 Thread arun

Hi Troels,
Not sure this is what you want.


 X-runif(9,0,10)
 FF1-data.frame(ID=c(1,2,3)[rep(c(1,1,1,2,2,2,3,3,3))], 
 PERIOD=c(1,2,3)[rep(c(1,2,3),times=3)],X=X)
 FF1$X[4]-NA
 FF1
  ID PERIOD  X
1  1  1 8.27119347
2  1  2 9.64698097
3  1  3 2.74132386
4  2  1 NA
5  2  2 4.29322683
6  2  3 5.09269667
7  3  1 4.07936332
8  3  2 7.41808455
9  3  3 0.01558664


A.K.





- Original Message -
From: Troels Ring tr...@gvdnet.dk
To: r-help@r-project.org
Cc: 
Sent: Wednesday, May 16, 2012 11:56 AM
Subject: [R] simple data.frame question

Dear friends - I hope you will forgive me another simple question, illustrated 
by

ID - c(1,1,1,2,2,3,3,3)
PERIOD - c(1,2,3,2,3,1,2,3)
X - runif(8,0,10))

FF - data.frame(ID=ID,PERIOD=PERIOD,X=X)

I need to the fourth value of X as NA, and ID and PERIOD is updated to 
1,1,1,2,2,2,3,3,3 and 1,2,3,1,2,3,1,2,3 respectively.
How do I use the pattern in ID and PERIOD to find the lacking X and put NA?

Best wishes

Troels Ring,
Aalborg, Denmark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] TukeyHSD plot error

2012-05-16 Thread Bret Jagger

Hi, I am seeking help with an error when running the example from R
Documentation for TukeyHSD.  The error occurs with any example I run, from
any text book or website.  thank you...

 plot(TukeyHSD(fm1, tension)).
Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) :
  error in evaluating the argument 'x' in selecting a method for function
'plot': Error in UseMethod(vcov) :
  no applicable method for 'vcov' applied to an object of class NULL


 ?TukeyHSD
 require(graphics)

 summary(fm1 - aov(breaks ~ wool + tension, data = warpbreaks))
Df Sum Sq Mean Sq F value  Pr(F)
wool 1451   450.7   3.339 0.07361 .
tension  2   2034  1017.1   7.537 0.00138 **
Residuals   50   6748   135.0
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 TukeyHSD(fm1, tension, ordered = TRUE)
  Tukey multiple comparisons of means
95% family-wise confidence level
factor levels have been ordered

Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks)

$tension
 difflwr  upr p adj
M-H  4.72 -4.6311985 14.07564 0.4474210
L-H 14.72  5.3688015 24.07564 0.0011218
L-M 10.00  0.6465793 19.35342 0.0336262

 plot(TukeyHSD(fm1, tension))
Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) :
  error in evaluating the argument 'x' in selecting a method for function
'plot': Error in UseMethod(vcov) :
  no applicable method for 'vcov' applied to an object of class NULL

 sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

attached base packages:
 [1] grid  tcltk splines   stats graphics  grDevices datasets
 utils methods   base

other attached packages:
 [1] RcmdrPlugin.HH_1.1-30 HH_2.2-30 latticeExtra_0.6-19
RColorBrewer_1.0-5
 [5] leaps_2.9 multcomp_1.2-12   mvtnorm_0.9-9992
 NADA_1.5-4
 [9] ggplot2_0.9.0 Rcmdr_1.8-3   car_2.0-12
 nnet_7.3-1
[13] DAAG_1.12 survival_2.36-12  randomForest_4.6-6
 rpart_3.1-52
[17] RODBC_1.3-5   tree_1.0-29   spatstat_1.25-5
mgcv_1.7-13
[21] sciplot_1.0-9 spdep_0.5-45  coda_0.14-6
deldir_0.0-16
[25] maptools_0.8-14   foreign_0.8-49nlme_3.1-103
 MASS_7.3-17
[29] boot_1.3-4sp_0.9-98 odesolve_0.9-9
 mcmc_0.8
[33] lme4_0.999375-42  Matrix_1.0-6  lattice_0.20-6
 chron_2.3-42
[37] akima_0.5-7   rcom_2.2-3.1.1rscproxy_1.3-1

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1 dichromat_1.2-4  digest_0.5.2 memoise_0.1
 munsell_0.3  plyr_1.7.1
 [7] proto_0.3-9.2reshape2_1.2.1   scales_0.2.0 stats4_2.15.0
 stringr_0.6  tools_2.15.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] error code trying to extract second column from coeftest output

2012-05-16 Thread rl269

I want to use the standard error values in the summary that is produced using
coeftest, but I am getting an error code- any ideas?

 library(lmtest)
 coeftest(lmodT_WBHO)

t test of coefficients:

Estimate Std. Error t value  Pr(|t|)
t1W  5.948190.17072 34.8410  2.2e-16 ***
t2W  6.562160.17438 37.6322  2.2e-16 ***
t3W  6.082520.16525 36.8082  2.2e-16 ***
t4W  6.180410.17028 36.2949  2.2e-16 ***
t1B  5.50.50566 10.8768  2.2e-16 ***
t2B  5.650000.53034 10.6535  2.2e-16 ***
t3B  4.523810.51756  8.7406  2.2e-16 ***
t4B  4.380950.51756  8.4646  2.2e-16 ***
t1H  5.050000.53034  9.5221  2.2e-16 ***
t2H  4.80.55903  8.5465  2.2e-16 ***
t3H  5.526320.54412 10.1564  2.2e-16 ***
t4H  4.714290.63388  7.4372 2.236e-13 ***
t1O  5.176470.57524  8.9988  2.2e-16 ***
t2O  5.818180.50566 11.5060  2.2e-16 ***
t3O  6.50.63388 10.2543  2.2e-16 ***
t4O  5.714290.63388  9.0147  2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 se1 - coeftest(lmodT_WBHO)$coef[,2]
Error in coeftest(lmodT_WBHO)$coef : 
  $ operator is invalid for atomic vectors
 


--
View this message in context: 
http://r.789695.n4.nabble.com/error-code-trying-to-extract-second-column-from-coeftest-output-tp4630298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Getting reliable financial ratios

2012-05-16 Thread Keith Weintraub

Check out this site:
http://www.gummy-stuff.org/Yahoo-data.htm

It shows how to download a .csv file with the data you might want.

Here is an example URL:
 http://finance.yahoo.com/d/quotes.csv?s=XOM+BBDb.TO+JNJ+MSFTf=snd1l1yrr2


The r2 in the above URL means P/E ratio.

You should be able to automate this in R pretty easily. An endeavor I leave to 
the reader.

Good luck,
KW


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] kolmogorov-Smirnov critical values

2012-05-16 Thread David Winsemius



On May 16, 2012, at 12:27 PM, aramos wrote:


Thanks, I've already done that!!


But the illustration for how you get the statistics is in the code.

Describe what you want: number of samples, two versus single sided,  
two sample versus comparing to theory, which table columns should be  
used. Then someone can probably help.






--
View this message in context: 
http://r.789695.n4.nabble.com/kolmogorov-Smirnov-critical-values-tp4630245p4630276.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Scraping a web page.

2012-05-16 Thread Keith Weintraub

Duncan,
   Thanks for the advice.

It turns out that the web pages are pretty well behaved.

I ended up using
readHTMLTable
str_select
grep
gsub
readLines

When I have time I am going to convert my code to use the html parser and the 
more robust getNodeSet method that you mention below.

Thanks for your detailed reply,
KW


Message: 139
Date: Tue, 15 May 2012 21:02:05 -0700
From: Duncan Temple Lang dun...@wald.ucdavis.edu
To: r-help@r-project.org
Subject: Re: [R] Scraping a web page.
Message-ID: 4fb326bd.9080...@wald.ucdavis.edu
Content-Type: text/plain; charset=ISO-8859-1


Hi Keith

Of course, it doesn't necessarily matter how you get the job done
if it actually works correctly.  But for a general approach,
it is useful to use general tools and can lead to more correct,
more robust, and more maintainable code.

Since htmlParse() in the XML package can both retrieve and parse the HTML 
document
 doc = htmlParse(the.url)

is much more succinct than using curlPerform().
However, if you want to use RCurl, just use

   txt = getURLContent(the.url)

and  that replaces

 h = basicTextGatherer()
 curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update)
 h$value()


If you have parsed the HTML document, you can find the a nodes that have an
href attribute that start with /en/Ships via

 hrefs = unlist(getNodeSet(doc, //a[starts-with(@href, '/en/Ships')]/@href))


The result is a character vector and you can extract the relevant substrings 
with
substring() or gsub() or any wrapper of those functions.

There are many benefits of parsing the HTML, including not falling foul of
as far as I can tell the the a tag is always on it's own line being not 
true.

   D.



On 5/15/12 4:06 AM, Keith Weintraub wrote:
 Thanks,
  That was very helpful.
 
 I am using readLines and grep. If grep isn't powerful enough I might end up 
 using the XML package but I hope that won't be necessary.
 
 Thanks again,
 KW
 
 --
 
 On May 14, 2012, at 7:18 PM, J Toll wrote:
 
 On Mon, May 14, 2012 at 4:17 PM, Keith Weintraub kw1...@gmail.com wrote:
 Folks,
 I want to scrape a series of web-page sources for strings like the 
 following:
 
 /en/Ships/A-8605507.html
 /en/Ships/Aalborg-8122830.html
 
 which appear in an href inside an a tag inside a div tag inside a table.
 
 In fact all I want is the (exactly) 7-digit number before .html.
 
 The good news is that as far as I can tell the the a tag is always on 
 it's own line so some kind of line-by-line grep should suffice once I 
 figure out the following:
 
 What is the best package/command to use to get the source of a web page. I 
 tried using something like:
 if(url.exists(http://www.omegahat.org/RCurl;)) {
 h = basicTextGatherer()
 curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update)
  # Now read the text that was cumulated during the query response.
 h$value()
 }
 
 which works except that I get one long streamed html doc without the line 
 breaks.
 
 You could use:
 
 h - readLines(http://www.omegahat.org/RCurl;)
 
 -- or --
 
 download.file(url = http://www.omegahat.org/RCurl;, destfile = tmp.html)
 h = scan(tmp.html, what = , sep = \n)
 
 and then use grep or the XML package for processing.
 
 HTH
 
 James
 
 
   [[alternative HTML version deleted]]




--


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding mean and SD for a log-normal distribution

2012-05-16 Thread peter dalgaard


On May 16, 2012, at 12:37 , Andras Farkas wrote:

 Dear R Expert
  
 allow me to ask a quick qestion: I have a mean value of 6 and a SD of 3 
 describing my distribution. I would like to convert this distribution into 
 a log normal distribution that would best describe it when resimulated using 
 log normal distribution. Currently I am using another software to estimate 
 the respective mean and SD on the log scale and the results are: 1.6667 and 
 SD 0.47071. Then, to best reproduce my original distribution in R, I use the 
 following commands:
  
 c - rlnorm(5000,1.6667,0.47071)
 d - exp(c)
 mean(c)
 sd(c)
  
 and the results for mean and SD are 5.92 and 2.94 (original 6 and 3), 
 respectively, which I am reasonably happy with. I would like to grow 
 independent of the another software I use, but am unable to figure out how to 
 generate the values of 1.6667 and 0.47071 using R. could someone please help 
 me with this question?

Perhaps this was what you were looking for:

 d - log(c)
 mean(d)
[1] 1.675003
 sd(d)
[1] 0.4656469

Taking exp() of a log-normal rarely makes much sense. More commonly, you take 
log() to get a normal distribution.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trouble with ifelse statement

2012-05-16 Thread Melissa Rosenkranz

Hello,

I apologize in advance for not providing sample data, I'm a very new to R
and can't easily generate appropriate sample data quickly. I'm hoping
someone can offer advice without it.

This code below works and does what I want it to do, which is for a given
row in my dataframe, where the variable peak.cort = max, it makes the
value of another variable max.cort = to match the value of a third
variable cortisol for that row.

*
index - raw.saliva.data$peak.cort == 'max'
raw.saliva.data$max.cort[index] -
(raw.saliva.data$cortisol[index])
*

Now, I want to execute this function only if the value of a fourth
variable, sample is 1 and 5. I tried to add an ifelse statement to the
code above so that it looks like this:

*
index - raw.saliva.data$peak.cort == 'max'
raw.saliva.data$max.cort[index] - ifelse(sample1  sample5,
raw.saliva.data$cortisol[index], NA)
*

and I get this error: Error in sample  1 :
  comparison (6) is possible only for atomic and list types

I can't figure out how to fix this problem. Any advice is appreciated.

Thank you.
-- 
*Melissa*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] survival survfit with newdata

2012-05-16 Thread Damjan Krstajic


Dear all,

I am confused with the behaviour of survfit with newdata option.

I am using the latest version R-2-15-0. In the simple example below I am 
building a coxph model on 90 patients and trying to predict 10 patients. 
Unfortunately the survival curve at the end is for 90 patients. Could somebody 
please from the survival package confirm that this behaviour is as expected or 
not - because I cannot find a way of using 'newdata' with really new data. 
Thanks in advance. DK

 x-matrix(rnorm(100*20),100,20)


time-runif(100,min=0,max=7)


status-sample(c(0,1), 100, replace = TRUE)  
 trainX-x[11:100,]  

trainTime-time[11:100]  

trainStatus-status[11:100]  

testX-x[1:10,]  
 coxph.model-
coxph(Surv(trainTime,trainStatus)~ trainX)  
 sfit- survfit(coxph.model,newdata=data.frame(testX))


dim(sfit$surv)

[1] 90 90


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] clusters in zero-inflated negative binomial models

2012-05-16 Thread Ben Bolker

Lies Durnez ldurnez at itg.be writes:

 I want to build a model in R based on animal collection data, that look like
the following
 
 NrVillage DistrictSiteSurvey  Species Count
 1 AX  A   F   Dry B   0
 2 AY  A   V   Wet A   5
 3 BX  B   F   Wet B   1
 4 BY  B   V   Dry B   0

 
 Each data point shows one collection unit in a certain Village,
 District, Site, and Survey for a certain Species. 'Count' is the
 number of animals collected in that collection unit. It is possible
 that zero animals are collected in that unit because of very low
 densities, but also because of climatic conditions (wind, rain,
 etc), so we would expect an excess in zeroes. I have tested that the
 data are overdispersed (variance much bigger than mean), so a
 zero-inflated negative binomial model seems the most suitable model
 in this case.

 [snip snip snip]

 However, the animal collections were only done in 4 districts, and
 in each district 3 villages were chosen (a total of 12
 villages). This should be included in the design. The package survey
 allows this for the standard negative binomial model, but it seems
 to me that it is not possible for the zero-inflated NB. So, my
 question is two-fold: 1. Is a zero-inflated NB possible in the
 survey package. If yes, how?  2. If no, how can I build a
 zero-inflated NB model that takes into account the clustering of the
 observations (animal counts) in villages and the clustering of the
 villages in districts.

  Treating villages and districts as random effects (clusters)
basically puts you in the domain of generalized linear mixed models.
You can use the glmmADMB package to fit zero-inflated, mixed negative
binomial models.  You can also use the MCMCglmm package to fit
lognormal-Poisson models, which are another form of overdispersed
count data (it depends how strongly you require that the actual model
be NB as opposed to just a reasonable model for overdispersed count
data).

4 districts is not very many for estimating an among-district variance 
(which is basically what you are doing when you fit a clustered/
mixed model), so I might suggest using district as a fixed effect,
but then using district:village (i.e. the interaction between district
and village, or village alone if they are uniquely labeled).

  http://glmm.wikidot.com/faq may be useful.

  I would suggest that you send follow-ups to the
r-sig-mixed-models at r-project.org mailing list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Optimization problem

2012-05-16 Thread Greg Snow

There are a couple of options.

First if you want the mean to equal 7, then that means the sum must
equal 21 and therefore you can let optim only play with 2 of the
variables, then set the 3rd to be 21-s1-s2.

If you want the mean to be greater than 7 then just put in a test, if
the mean is less than 7 then return -Inf or another really small
number, if the mean is large enough then go on to compute the function
that you want to maximize.

Also note that you don't need the loop, it can be replaced with sum(s^3).

On Wed, May 16, 2012 at 10:44 AM, Pacin Al jok...@gmail.com wrote:
 Hi,

 I'm dealing with an optimization problem. I'm using 'optim' to maximize the
 output of a function, given some restrictions on the input. I would like to
 know if there is a way to impose some restrictions on 'intermediate
 variables' of the function. An example..

 fx = function (x)
 {
 s - 0
 for (i in 1:3)
 {
 s - x[i]^3 + s
 }
 s
 }

 optim(rep(4,3), method=L-BFGS-B, lower=rep(-10,nlin), upper=rep(10,nlin))

 It would return '-10' for all variables. I want, however, a solution
 satisfying mean(x)7.
 Please, don't analyse this specific example, but the logic of satisfying a
 criterium for the mean of the input (with thousands of variables). My real
 problem involves price elasticity and I want to find the price increase for
 each individual that would give me maximum total profit margin, but
 respecting a minimum retention of clients.

 Thank you very much,
 John Mayer

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Optimization-problem-tp4630278.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variable spatial correlation

2012-05-16 Thread m p

Hello,
I used correlogram from spatial package to determine correlation scale
for my data but just looking with bare eye it seems that the correlation
scale varies over the domain.
Can someone suggest what would the best way to handle that problem?
Thanks,
Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with ifelse statement

2012-05-16 Thread R. Michael Weylandt

It seems like your problem is that R can't find your variable sample
and is instead finding its own sample() function which can't be
compared to an integer and is giving your problem.

It seems likely that sample is part of raw.saliva.data? If that's the
case, change sample -- raw.saliva.data$sample

However, I'm thinking that this isn't going to do quite what you
really want because sample is long (corresponding to all the rows, not
just the subset by index) -- you could probably do something like this
instead:

raw.saliva.data - within(raw.saliva.data, max.cort[index] - ifelse(
(sample  1  sample  5)[index], cortisol[index], NA))

Since you're telling R to look within() raw.saliva.data, lookup should
probably work for you.

Note that we have to restrict both the (sample) and (cortisol) parts
of ifelse() to just the rows index for this to work (else things get
lined up wrong)

Note finally that you also have to reassign back to raw.saliva.data
for this to have an effect.

Hope this helps,
Michael


On Wed, May 16, 2012 at 5:01 PM, Melissa Rosenkranz
melissarosenkr...@gmail.com wrote:
 Hello,

 I apologize in advance for not providing sample data, I'm a very new to R
 and can't easily generate appropriate sample data quickly. I'm hoping
 someone can offer advice without it.

 This code below works and does what I want it to do, which is for a given
 row in my dataframe, where the variable peak.cort = max, it makes the
 value of another variable max.cort = to match the value of a third
 variable cortisol for that row.

 *
 index - raw.saliva.data$peak.cort == 'max'
            raw.saliva.data$max.cort[index] -
 (raw.saliva.data$cortisol[index])
 *

 Now, I want to execute this function only if the value of a fourth
 variable, sample is 1 and 5. I tried to add an ifelse statement to the
 code above so that it looks like this:

 *
 index - raw.saliva.data$peak.cort == 'max'
            raw.saliva.data$max.cort[index] - ifelse(sample1  sample5,
 raw.saliva.data$cortisol[index], NA)
 *

 and I get this error: Error in sample  1 :
  comparison (6) is possible only for atomic and list types

 I can't figure out how to fix this problem. Any advice is appreciated.

 Thank you.
 --
 *Melissa*

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] error code trying to extract second column from coeftest output

2012-05-16 Thread Jeff Newmiller

I recommend that you troubleshoot your own problem using the str function... 
for example, str( coeftest(lmodT_WBHO)). The error message is not a code... 
it is perfectly readable English, and it is telling you that the result of 
calling coeftest is not a list with parts that can be pulled out using the $ 
operator.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.



rl269 rl...@acad.umass.edu wrote:

I want to use the standard error values in the summary that is produced
using
coeftest, but I am getting an error code- any ideas?

 library(lmtest)
 coeftest(lmodT_WBHO)

t test of coefficients:

Estimate Std. Error t value  Pr(|t|)
t1W  5.948190.17072 34.8410  2.2e-16 ***
t2W  6.562160.17438 37.6322  2.2e-16 ***
t3W  6.082520.16525 36.8082  2.2e-16 ***
t4W  6.180410.17028 36.2949  2.2e-16 ***
t1B  5.50.50566 10.8768  2.2e-16 ***
t2B  5.650000.53034 10.6535  2.2e-16 ***
t3B  4.523810.51756  8.7406  2.2e-16 ***
t4B  4.380950.51756  8.4646  2.2e-16 ***
t1H  5.050000.53034  9.5221  2.2e-16 ***
t2H  4.80.55903  8.5465  2.2e-16 ***
t3H  5.526320.54412 10.1564  2.2e-16 ***
t4H  4.714290.63388  7.4372 2.236e-13 ***
t1O  5.176470.57524  8.9988  2.2e-16 ***
t2O  5.818180.50566 11.5060  2.2e-16 ***
t3O  6.50.63388 10.2543  2.2e-16 ***
t4O  5.714290.63388  9.0147  2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

 se1 - coeftest(lmodT_WBHO)$coef[,2]
Error in coeftest(lmodT_WBHO)$coef : 
  $ operator is invalid for atomic vectors
 


--
View this message in context:
http://r.789695.n4.nabble.com/error-code-trying-to-extract-second-column-from-coeftest-output-tp4630298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] TukeyHSD plot error

2012-05-16 Thread R. Michael Weylandt

Hmmm, I can't reproduce, but I'm not really sure why that would
happen... is there any way you can test this in a --vanilla R session?
(That's the UNIX-y way to start a totally clean session; not sure
exactly how to achieve that on Windows)

Does this happen if you just run

example(TukeHSD)

directly or only when you copy and paste the commands yourself?
Hopefully we'll be able to track this down, but my initial guess is
that it's some nasty combination of all the packages you have up.

Michael

On Wed, May 16, 2012 at 1:16 PM, Bret Jagger cantleavethi...@gmail.com wrote:
 Hi, I am seeking help with an error when running the example from R
 Documentation for TukeyHSD.  The error occurs with any example I run, from
 any text book or website.  thank you...

 plot(TukeyHSD(fm1, tension)).
 Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) :
  error in evaluating the argument 'x' in selecting a method for function
 'plot': Error in UseMethod(vcov) :
  no applicable method for 'vcov' applied to an object of class NULL


 ?TukeyHSD
 require(graphics)

 summary(fm1 - aov(breaks ~ wool + tension, data = warpbreaks))
            Df Sum Sq Mean Sq F value  Pr(F)
 wool         1    451   450.7   3.339 0.07361 .
 tension      2   2034  1017.1   7.537 0.00138 **
 Residuals   50 6748 135.0
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 TukeyHSD(fm1, tension, ordered = TRUE)
  Tukey multiple comparisons of means
    95% family-wise confidence level
    factor levels have been ordered

 Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks)

 $tension
         diff        lwr      upr     p adj
 M-H  4.72 -4.6311985 14.07564 0.4474210
 L-H 14.72  5.3688015 24.07564 0.0011218
 L-M 10.00  0.6465793 19.35342 0.0336262

 plot(TukeyHSD(fm1, tension))
 Error in plot(confint(as.glht(x)), ylim = c(0.5, n.contrasts + 0.5), ...) :
  error in evaluating the argument 'x' in selecting a method for function
 'plot': Error in UseMethod(vcov) :
  no applicable method for 'vcov' applied to an object of class NULL

 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: i386-pc-mingw32/i386 (32-bit)

 attached base packages:
  [1] grid      tcltk     splines   stats     graphics  grDevices datasets
  utils     methods   base

 other attached packages:
  [1] RcmdrPlugin.HH_1.1-30 HH_2.2-30             latticeExtra_0.6-19
 RColorBrewer_1.0-5
  [5] leaps_2.9             multcomp_1.2-12       mvtnorm_0.9-9992
  NADA_1.5-4
  [9] ggplot2_0.9.0         Rcmdr_1.8-3           car_2.0-12
  nnet_7.3-1
 [13] DAAG_1.12             survival_2.36-12      randomForest_4.6-6
  rpart_3.1-52
 [17] RODBC_1.3-5           tree_1.0-29           spatstat_1.25-5
 mgcv_1.7-13
 [21] sciplot_1.0-9         spdep_0.5-45          coda_0.14-6
 deldir_0.0-16
 [25] maptools_0.8-14       foreign_0.8-49        nlme_3.1-103
  MASS_7.3-17
 [29] boot_1.3-4            sp_0.9-98             odesolve_0.9-9
  mcmc_0.8
 [33] lme4_0.999375-42      Matrix_1.0-6          lattice_0.20-6
  chron_2.3-42
 [37] akima_0.5-7           rcom_2.2-3.1.1        rscproxy_1.3-1

 loaded via a namespace (and not attached):
  [1] colorspace_1.1-1 dichromat_1.2-4  digest_0.5.2     memoise_0.1
  munsell_0.3      plyr_1.7.1
  [7] proto_0.3-9.2    reshape2_1.2.1   scales_0.2.0     stats4_2.15.0
  stringr_0.6      tools_2.15.0

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Unable to install package

2012-05-16 Thread Rismyname

Hi,
I get the following error while installing a package. Can someone please
help?

install.packages(memisc)
Warning in install.packages :
  argument 'lib' is missing: using 'C:/Users/ravi/Documents/R/R-2.15.0'
Warning in install.packages :
  downloaded length 8255 != reported length 200
Error in install.packages : Line starting '!DOCTYPE html PUBLI ...' is
malformed!

thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/Unable-to-install-package-tp4630320.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] triangular matrices input/output

2012-05-16 Thread casperyc

Hi,

Is there any package that deals with triangular matrices?

Say ways of inputting an upper (lower) triangular matrix?

Or convert a vector of length 6 to an upper (lower) triangular matrix (by
row/column)?

Thanks!

-
##
PhD candidate in Statistics
Big R Fan
Big LEGO Fan
Big sTaTs Fan
##

--
View this message in context: 
http://r.789695.n4.nabble.com/triangular-matrices-input-output-tp4630310.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble with ifelse statement

2012-05-16 Thread Rui Barradas


Hello,

'sample' is a really bad name for a variable, it's already taken, it's an R
function.

sample1  sample5  # '' is not vectorized, it's '' you want.

# Without 'ifelse'
raw.saliva.data$max.cort[index] -  raw.saliva.data$cortisol[index  sample
 1  sample  5]

Negate this last conjunction if you want to set the other 'max.cort' to NA,

!(index  sample  1  sample  5)

And, finally, this is untested. Give a small dataset example, including
'sample' (after calling it something else).

Hope this helps,

Rui Barradas

la mer wrote
 
 Hello,
 
 I apologize in advance for not providing sample data, I'm a very new to R
 and can't easily generate appropriate sample data quickly. I'm hoping
 someone can offer advice without it.
 
 This code below works and does what I want it to do, which is for a given
 row in my dataframe, where the variable peak.cort = max, it makes the
 value of another variable max.cort = to match the value of a third
 variable cortisol for that row.
 
 *
 index - raw.saliva.data$peak.cort == 'max'
 raw.saliva.data$max.cort[index] -
 (raw.saliva.data$cortisol[index])
 *
 
 Now, I want to execute this function only if the value of a fourth
 variable, sample is 1 and 5. I tried to add an ifelse statement to the
 code above so that it looks like this:
 
 *
 index - raw.saliva.data$peak.cort == 'max'
 raw.saliva.data$max.cort[index] - ifelse(sample1 
 sample5,
 raw.saliva.data$cortisol[index], NA)
 *
 
 and I get this error: Error in sample  1 :
   comparison (6) is possible only for atomic and list types
 
 I can't figure out how to fix this problem. Any advice is appreciated.
 
 Thank you.
 -- 
 *Melissa*
 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


--
View this message in context: 
http://r.789695.n4.nabble.com/trouble-with-ifelse-statement-tp4630309p4630316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Updating Neural Networks

2012-05-16 Thread Josh Browning

Hi useRs,



I apologize if Ive missed some documentation somewhere, but I cant seem
to find anything related to this question  For a ensemble/data-mining
problem, Im trying to train a neural network on my data set and have it
output predictions (or coefficients) after varying numbers of epochs
(preferably using nnet, as that package seems the most user-friendly to me,
but Im open to other packages too).  Is there a way to do this, or would I
need to rerun nnet for every different epoch value I wish to consider?



Thanks so much for your help!


Josh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] triangular matrices input/output

2012-05-16 Thread R. Michael Weylandt

The Matrix package provides good support for many special sorts of
matrices, but here it looks like you probably don't need that
additional machinery for such small case:

makeUpper - function(vec, diag = FALSE){
n - (-1 + sqrt(1 + 8*length(vec)))/2
stopifnot(isTRUE(all.equal(n, as.integer(n

if(!diag) n - n + 1

mat - matrix(0, ncol = n, nrow = n)
mat[upper.tri(mat, diag)] - vec
mat
}

I think does what you want and it's not too hard to generalize to
lower triangular.

E.g.,

v - 1:6
makeUpper(v)
makeUpper(v, diag = TRUE)

It's not super well tested though so caveat lector.

Michael

On Wed, May 16, 2012 at 5:09 PM, casperyc caspe...@hotmail.co.uk wrote:
 Hi,

 Is there any package that deals with triangular matrices?

 Say ways of inputting an upper (lower) triangular matrix?

 Or convert a vector of length 6 to an upper (lower) triangular matrix (by
 row/column)?

 Thanks!

 -
 ##
 PhD candidate in Statistics
 Big R Fan
 Big LEGO Fan
 Big sTaTs Fan
 ##

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/triangular-matrices-input-output-tp4630310.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice: Add abline to Single Value qqmath() Plot

2012-05-16 Thread Rich Shepard


On Tue, 15 May 2012, ilai wrote:


Apologies in advance if I misinterpret  R console insists that I retype
... but actually makes more sense (than sourcing a script) to use the
group argument (see the last example in ?qqmath) as in 4 groups in each of
30 panels, or allow.multiple=T, outer=T if you really want separate panels
for each transformation. Also can use layout=c(1,1,120) if you need each
in a separate page - or say c(3,2,20) for 20 pages of 6 panels each, etc.
Regarding your script, there is a syntax error:


ilai,

  Grouping doesn't do what's needed, but the split() function does. Thanks
for pointing me in that direction.

Rich

--
Richard B. Shepard, Ph.D.  |   Integrity - Credibility - Innovation
Applied Ecosystem Services, Inc.   |Helping Ensure Our Clients' Futures
http://www.appl-ecosys.com Voice: 503-667-4517  Fax: 503-667-8863

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] triangular matrices input/output

2012-05-16 Thread R. Michael Weylandt michael.weyla...@gmail.com

Do leave the posts for anyone else who might google the same question. (I don't 
think you really could delete the post anyways, perhaps only on one mirror)

You could probably use some combination or rev() and t() to fill by row, but I 
haven't thought through the geometry all the way yet. 

Michael 

On May 16, 2012, at 8:13 PM, YUProf caspe...@hotmail.co.uk wrote:

 Hi Michael,
 
 I have figured out a 'super' easy way myself and already deleted the post.
 
 It can be done using: (no package necssary)
 
 d=c(3,6,2,1,4,5)
 x=matrix(,3,3)
 
 # by column, 
 x[!lower.tri(x)]=d
 
 I am still trying very hard to think of a way to fit it by row as I sometimes 
 have to!
 
 THANKS!
 
 Chen
 
 ==
 Mr Chen YU
 PhD candidate in Statistics
 School of Mathematics, Statistics and Actuarial Science, University of Kent
 
 D7/D Woolf College, The Pavilion, Giles Lane, Canterbury, Kent CT2 7BQ
 Mobile: +44(0)7725003559
 ==
 
  From: michael.weyla...@gmail.com
  Date: Wed, 16 May 2012 19:41:36 -0400
  Subject: Re: [R] triangular matrices input/output
  To: caspe...@hotmail.co.uk
  CC: r-help@r-project.org
  
  The Matrix package provides good support for many special sorts of
  matrices, but here it looks like you probably don't need that
  additional machinery for such small case:
  
  makeUpper - function(vec, diag = FALSE){
  n - (-1 + sqrt(1 + 8*length(vec)))/2
  stopifnot(isTRUE(all.equal(n, as.integer(n
  
  if(!diag) n - n + 1
  
  mat - matrix(0, ncol = n, nrow = n)
  mat[upper.tri(mat, diag)] - vec
  mat
  }
  
  I think does what you want and it's not too hard to generalize to
  lower triangular.
  
  E.g.,
  
  v - 1:6
  makeUpper(v)
  makeUpper(v, diag = TRUE)
  
  It's not super well tested though so caveat lector.
  
  Michael
  
  On Wed, May 16, 2012 at 5:09 PM, casperyc caspe...@hotmail.co.uk wrote:
   Hi,
  
   Is there any package that deals with triangular matrices?
  
   Say ways of inputting an upper (lower) triangular matrix?
  
   Or convert a vector of length 6 to an upper (lower) triangular matrix (by
   row/column)?
  
   Thanks!
  
   -
   ##
   PhD candidate in Statistics
   Big R Fan
   Big LEGO Fan
   Big sTaTs Fan
   ##
  
   --
   View this message in context: 
   http://r.789695.n4.nabble.com/triangular-matrices-input-output-tp4630310.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide 
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subreddit for R related stuff

2012-05-16 Thread Robert M. Flight

Thought there might be some Redditors lurking on these mailing lists. I
created a sub-reddit for R (and by extension Bioconductor) discussions,
links, etc.

http://www.reddit.com/r/Rsoftware/

This will be the first and only shameless plug.

-Robert

Robert M. Flight, Ph.D.
University of Louisville Bioinformatics Laboratory
University of Louisville
Louisville, KY

PH 502-852-1809 (HSC)
PH 502-852-0467 (Belknap)
EM robert.fli...@louisville.edu
EM rfligh...@gmail.com
robertmflight.blogspot.com
bioinformatics.louisville.edu/lab
github.com/rmflight/general/wiki

The most exciting phrase to hear in science, the one that heralds new
discoveries, is not Eureka! (I found it!) but That's funny ... - Isaac
Asimov

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] using XML package to read RSS

2012-05-16 Thread J Toll

Hi,

I'm trying to use the XML package to read an RSS feed.  To get
started, I was trying to use this post as an example:

http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/

I can replicate the beginning section of the post, but when I try to
use another RSS feed I have an issue.  The RSS feed I would like to
use is:

 URL - 
 http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom;

 library(XML)
 doc - xmlTreeParse(URL)

 src - xpathApply(xmlRoot(doc), //entry)

I get an empty list rather than a list of each of the entry:

 src
list()
attr(,class)
[1] XMLNodeSet

I'm not sure how to fix this.  Any suggestions?  Do I need to provide
a namespace, or is the RSS malformed?

Thanks,


James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using XML package to read RSS

2012-05-16 Thread Duncan Temple Lang

Hi James.

 Yes, you need to identify the namespace in the query, e.g.

  getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;))

This yeilds 40 matching nodes.

(getNodeSet() is more convenient to use when you don't specify a function
to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
entire document with the query //)

 BTW, you want to use xmlParse() and not xmlTreeParse().

   D.


On 5/16/12 6:40 PM, J Toll wrote:
 Hi,
 
 I'm trying to use the XML package to read an RSS feed.  To get
 started, I was trying to use this post as an example:
 
 http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/
 
 I can replicate the beginning section of the post, but when I try to
 use another RSS feed I have an issue.  The RSS feed I would like to
 use is:
 
 URL - 
 http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom;
 
 library(XML)
 doc - xmlTreeParse(URL)
 
 src - xpathApply(xmlRoot(doc), //entry)
 
 I get an empty list rather than a list of each of the entry:
 
 src
 list()
 attr(,class)
 [1] XMLNodeSet
 
 I'm not sure how to fix this.  Any suggestions?  Do I need to provide
 a namespace, or is the RSS malformed?
 
 Thanks,
 
 
 James
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] caret: Error when using rpart and CV != LOOCV

2012-05-16 Thread Max Kuhn

Dominik,

See this line:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  30.37   30.37   30.37   30.37   30.37   30.37

The variance of the predictions is zero. caret uses the formula for
R^2 by calculating the correlation between the observed data and the
predictions which uses sd(pred) which is zero. I believe that the same
would occur with other formulas for R^2.

Max

On Wed, May 16, 2012 at 11:54 AM, Dominik Bruhn domi...@dbruhn.de wrote:
 Thanks Max for your answer.

 First, I do not understand your post. Why is it a problem if two of
 predictions match? From the formula for calculating R^2 I can see that
 there will be a DivByZero iff the total sum of squares is 0. This is
 only true if the predictions of all the predicted points from the
 test-set are equal to the mean of the test-set. Why should this happen?

 Anyway, I wrote the following code to check what you tried to tell:

 --
 library(caret)
 data(trees)
 formula=Volume~Girth+Height

 customSummary - function (data, lev = NULL, model = NULL) {
    print(summary(data$pred))
    return(defaultSummary(data, lev, model))
 }

 tc=trainControl(method='cv', summaryFunction=customSummary)
 train(formula, data=trees,  method='rpart', trControl=tc)
 --

 This outputs:
 ---
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  18.45   18.45   18.45   30.12   35.95   53.44
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  22.69   22.69   22.69   32.94   38.06   53.44
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  30.37   30.37   30.37   30.37   30.37   30.37
 [cut many values like this]
 Warning: In nominalTrainWorkflow(dat = trainData, info = trainInfo,
 method = method,  :
  There were missing values in resampled performance measures.
 -

 As I didn't understand your post, I don't know if this confirms your
 assumption.

 Thanks anyway,
 Dominik


 On 16/05/12 17:30, Max Kuhn wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).

 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.

 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.

 Max

 On Wed, May 16, 2012 at 11:30 AM, Max Kuhn mxk...@gmail.com wrote:
 More information is needed to be sure, but it is most likely that some
 of the resampled rpart models produce the same prediction for the
 hold-out samples (likely the result of no viable split being found).

 Almost every incarnation of R^2 requires the variance of the
 prediction. This particular failure mode would result in a divide by
 zero.

 Try using you own summary function (see ?trainControl) and put a
 print(summary(data$pred)) in there to verify my claim.

 Max

 On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn domi...@dbruhn.de wrote:
 Hy,
 I got the following problem when trying to build a rpart model and using
 everything but LOOCV. Originally, I wanted to used k-fold partitioning,
 but every partitioning except LOOCV throws the following warning:

 
 Warning message: In nominalTrainWorkflow(dat = trainData, info =
 trainInfo, method = method, : There were missing values in resampled
 performance measures.
 -

 Below are some simplified testcases which repoduce the warning on my
 system.

 Question: What does this error mean? How can I avoid it?

 System-Information:
 -
 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] rpart_3.1-52   caret_5.15-023 foreach_1.4.0  cluster_1.14.2
 reshape_0.8.4
 [6] plyr_1.7.1     lattice_0.20-6

 loaded via a namespace (and not attached):
 [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0     iterators_1.0.6
 [5] tools_2.15.0
 ---


 Simlified Testcase I: Throws warning
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 train(formula, data=trees,  method='rpart')
 ---

 Simlified Testcase II: Every other CV-method also throws the warning,
 for example using 'cv':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='cv')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---

 Simlified Testcase III: The only CV-method which is working is 'LOOCV':
 ---
 library(caret)
 data(trees)
 formula=Volume~Girth+Height
 tc=trainControl(method='LOOCV')
 train(formula, data=trees,  method='rpart', trControl=tc)
 ---


 Thanks!
 --
 Dominik Bruhn
 mailto:

Re: [R] using XML package to read RSS

2012-05-16 Thread J Toll

On Wed, May 16, 2012 at 9:02 PM, Duncan Temple Lang
dun...@wald.ucdavis.edu wrote:
 Hi James.

  Yes, you need to identify the namespace in the query, e.g.

  getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;))

 This yeilds 40 matching nodes.

 (getNodeSet() is more convenient to use when you don't specify a function
 to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
 entire document with the query //)

  BTW, you want to use xmlParse() and not xmlTreeParse().

   D.


Brilliant!  Thank you so much.  I never would have figure out
specifying the namespace like that.  I had tried:

src - xpathApply(xmlRoot(doc), //entry, namespaces =
http://www.w3.org/2005/Atom;)

but that wasn't working.

Thanks again,


James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 104 matches

Mail list logo