date:20130607

I R-helpers



#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that separates the firms in two categories: 1 if the firm 
have an auditor; 0 if not
#and another variable the represents the firm dimension (total assets in 
thousand of euros)

#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exactly the
#same, it could vary in an interval of +/- 10%, for example)



#My reproducible example

firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)-c(firm,year,industry,dummy,dimension)

firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3
colnames(data3)-c(firm,year,industry,dummy,dimension)

final1-rbind(data1,data2)
final2-rbind(final1,data3)
final2
final3-final2[order(final2$year,final2$industry,final2$dimension),]
final3



Thank you very much,
Cecília Carmo

Universidade de Aveiro - Portugal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on loops

2013-06-07 Thread Laz


Dear Berend,

For reproducibility,
Rspatswap() is a function which originally returns a single value. For 
example Rspatswap(...)  and you get 0.8


So, run Rspatswap() 20 times and store all the values.
Then from these 20 values, calculate the calculate average, 
sd,se,min,max to get something similar to:


average   sd   se   min  max
trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241

If we repeat the function 10 times, then I expect 10 rows with 5 columns but it 
does not work ?

I hope I am now clearer...







On 6/7/2013 5:24 AM, Berend Hasselman wrote:

On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote:


Dear R users,

I am stuck here: My first function returns a vector of 5 values.
In my second function, I want to repeat this, a number of times, say 10
so that I have 10 rows and five columns but I keep on getting errors.

See the code and results below:

optm -function(perm, verbose = FALSE)
{
   trace-c()
   for (k in 1:perm){
trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1]
 perm[k]-k
 mat-cbind(trace, perm = seq(perm))
   }
   if (verbose){
 cat(***starting matrix\n)
 print(mat)
   }
   # iterate till done
   while(nrow(mat)  1){
 high - diff(mat[, 'trace'])  0
 if (!any(high)) break  # done
 # find which one to delete
 delete - which.max(high) + 1L
 mat - mat[-delete, ]
 newmat-apply(mat,2,mean)[1]
 sdm-sd(mat[,1])
 sem-sdm/sqrt(nrow(mat))
 maxv-mat[1,1]
 minv-mat[nrow(mat),1]
   }
   stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv)
   stats
}


test-optm(perm=20)
test

 average   sd   se   min  max
trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241


itn-function(it){
siml-matrix(NA,ncol=5,nrow=length(it))
   for(g in 1:it){
siml[g]-optm(perm=20)
   }
siml-cbind(siml=siml)
siml
}


ans-itn(5)

Warning messages:
1: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
2: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
3: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
4: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
5: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length

ans

   [,1]  [,2]  [,3]  [,4]  [,5]
[1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469



1. Not reproducible code. Where does function Rspatswap come from?

2. You have several errors in function itn:
 Argument it is a scalar: length(it) is 1. You need to do siml - 
matrix(NA,ncol=5,nrow=it)
Next in the g-loop you want to fill row g so do: siml[g,] - …..

Finally why are you doing siml - cbind(siml=siml)?
Seems superfluous to me. Delete the line.

Berend




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Conditional coloring of area between curves

2013-06-07 Thread Roland Pape

Hi Duncan,

that's exactly what I was looking for! The series have common times, so that's 
no problem. Thanks a lot!

Roland

Von: Duncan Murdoch [murdoch.dun...@gmail.com]
Gesendet: Donnerstag, 6. Juni 2013 18:45
An: Roland Pape
Cc: r-help@r-project.org
Betreff: Re: [R] Conditional coloring of area between curves

On 06/06/2013 10:41 AM, Roland Pape wrote:
 Dear list,

 I have two time series of temperatures from different sites plotted in the 
 same diagram and would like to color the area between the curves differently, 
 dependent on whether site 1 is cooler than site 2 (colored let's say in blue) 
 or warmer (red). While polygone() works fine to color the enclosed area in 
 general, I'm struggling with this conditional coloring. Any help is greatly 
 appreciated!

Suppose the series are named site1 and site2, and they have common
times in a variable times.  Then the following should do what you want:

smaller - pmin(site1, site2)
plot(x, site1, ylim = range(c(site1, site2)), type=n)
polygon(c(x, rev(x)), c(smaller, rev(site1)), col=red)
polygon(c(x, rev(x)), c(smaller, rev(site2)), col=blue)

If the times for the two series are different it's a little harder;
first you need to give them common times, and that will depend on how
you decide to evaluate the values between observations. Probably linear
interpolation (using approx()) is fine, but it's up to you.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distance-based non-parametric ANOVA

2013-06-07 Thread Jose Iparraguirre

Hi Mikhail,

After a cursory online search using simply  A new method for non-parametric 
multivariate analysis + CRAN + R, I got the following three packages which 
include this paper in their references: vegan, mefa, and TraMineR. I don't know 
whether they deal with the procedure you are after, but it would be worth 
having a look.
Regards,
José

Prof. José Iparraguirre
Chief Economist
Age UK

Age UK
Tavis House, 1- 6 Tavistock Square
London, WC1H 9NB

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mikhail Umorin
Sent: 06 June 2013 21:35
To: r-help@r-project.org
Subject: [R] Distance-based non-parametric ANOVA

Hello, 

I am comparing treatments by comparing within group to between group distances 
like described in 

MJ Anderson. 2001. A new method for non-parametric multivariate analysis of 
variance. 
Austral Ecology 26: 32 -- 46.

The idea is to find the ratio of within group sum of distance^2 to between 
group sum of distance^2. Then find the distribution of the statistic by 
permuting the sample.

Has anyone done this before in R code? Are there any packages (I did checked on 
CRAN - none). Code?

I want to make sure I won't be inventing a bicycle before I write my own 
package.

Comments? Suggestions? (I have never written an R package before)

Thank you for your time, 

Mikhail.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Age UK Improving later life

www.ageuk.org.uk


 
---
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798). 
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer 
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health 
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and 
regulated by the Financial Services Authority. 
--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing 
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged 
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of people in later life.  The three national 
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help 
the Aged in these nations to form three registered charities: 
Age Scotland, Age NI, Age Cymru.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on loops

2013-06-07 Thread Laz

Hi,

I am almost getting there, but still have errors. Thanks for your help. 
I have tried improving but I get the following errors below:

itn-function(it){
+siml-matrix(NA,ncol=5,nrow=it)
+for(g in 1:it){
+siml[g]-optm(perm=20)[g]
+}
+siml
+}
itn(3)
  [,1] [,2] [,3] [,4] [,5]
[1,] 0.8873775898   NA   NA   NA   NA
[2,] 0.0015584824   NA   NA   NA   NA
[3,] 0.0001414317   NA   NA   NA   NA


itn-function(it){
+siml-matrix(NA,ncol=5,nrow=it)
+for(g in 1:it){
+siml[g]-optm(perm=20)
+}
+siml
+}
itn(3)
   [,1] [,2] [,3] [,4] [,5]
[1,] 0.8880941   NA   NA   NA   NA
[2,] 0.8869727   NA   NA   NA   NA
[3,] 0.8877045   NA   NA   NA   NA
Warning messages:
1: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
2: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
3: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length


I expect something close to

 average   sd  se   min   max
 0.8881969 0.0008215379 0.000410769 0.8873842 0.8890167

0.884659 0.0004215379 0.000410769 0.2342 0.676307

0.8885839 0.0001215379 0.0002112 0.82752992 0.8836337




Thanks fpr you help.


On 6/7/2013 5:24 AM, Berend Hasselman wrote:
 On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote:

 Dear R users,

 I am stuck here: My first function returns a vector of 5 values.
 In my second function, I want to repeat this, a number of times, say 10
 so that I have 10 rows and five columns but I keep on getting errors.

 See the code and results below:

 optm -function(perm, verbose = FALSE)
 {
trace-c()
for (k in 1:perm){
 trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1]
  perm[k]-k
  mat-cbind(trace, perm = seq(perm))
}
if (verbose){
  cat(***starting matrix\n)
  print(mat)
}
# iterate till done
while(nrow(mat)  1){
  high - diff(mat[, 'trace'])  0
  if (!any(high)) break  # done
  # find which one to delete
  delete - which.max(high) + 1L
  mat - mat[-delete, ]
  newmat-apply(mat,2,mean)[1]
  sdm-sd(mat[,1])
  sem-sdm/sqrt(nrow(mat))
  maxv-mat[1,1]
  minv-mat[nrow(mat),1]
}
stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv)
stats
 }

 test-optm(perm=20)
 test
  average   sd   se   min  max
 trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241


 itn-function(it){
 siml-matrix(NA,ncol=5,nrow=length(it))
for(g in 1:it){
 siml[g]-optm(perm=20)
}
 siml-cbind(siml=siml)
 siml
 }

 ans-itn(5)
 Warning messages:
 1: In siml[g] - optm(perm = 20) :
number of items to replace is not a multiple of replacement length
 2: In siml[g] - optm(perm = 20) :
number of items to replace is not a multiple of replacement length
 3: In siml[g] - optm(perm = 20) :
number of items to replace is not a multiple of replacement length
 4: In siml[g] - optm(perm = 20) :
number of items to replace is not a multiple of replacement length
 5: In siml[g] - optm(perm = 20) :
number of items to replace is not a multiple of replacement length
 ans
[,1]  [,2]  [,3]  [,4]  [,5]
 [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469


 1. Not reproducible code. Where does function Rspatswap come from?

 2. You have several errors in function itn:
  Argument it is a scalar: length(it) is 1. You need to do siml - 
 matrix(NA,ncol=5,nrow=it)
 Next in the g-loop you want to fill row g so do: siml[g,] - ..

 Finally why are you doing siml - cbind(siml=siml)?
 Seems superfluous to me. Delete the line.

 Berend




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distance-based non-parametric ANOVA

2013-06-07 Thread Pascal Oettli

Hello,

You also can use the package sos

findFn('non-parametric multivariate
 
ANOVA')

Regards,
Pascal


2013/6/7 Mikhail Umorin mike...@gmail.com

 Hello,

 I am comparing treatments by comparing within group to between group
 distances like
 described in

 MJ Anderson. 2001. A new method for non-parametric multivariate analysis
 of variance.
 Austral Ecology 26: 32 -- 46.

 The idea is to find the ratio of within group sum of distance^2 to between
 group sum of
 distance^2. Then find the distribution of the statistic by permuting the
 sample.

 Has anyone done this before in R code? Are there any packages (I did
 checked on
 CRAN - none). Code?

 I want to make sure I won't be inventing a bicycle before I write my own
 package.

 Comments? Suggestions? (I have never written an R package before)

 Thank you for your time,

 Mikhail.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on loops

2013-06-07 Thread Berend Hasselman


On 07-06-2013, at 11:57, Laz lmra...@ufl.edu wrote:

 Dear Berend,
 
 I have made some changes but I still get errors:
 
 For reproducibility, 
 Rspatswap() is a function which originally returns a single value. For 
 example Rspatswap(...)  and you get 0.8 
 
 So, run Rspatswap() 20 times and store all the values. 
 Then from these 20 values, calculate the calculate average, sd,se,min,max to 
 get something similar to: 
 
 average   sd   se   min  max 
 trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 
 
 If we repeat the function 10 times, then I expect 10 rows with 5 columns but 
 it does not work ? 
 

Replacing Rspatswap(…)[1] with runif(1) (you could/should have done that to 
provide reproducible code) and the modifications I suggested you make in your 
function itn() like this

itn-function(it){
siml-matrix(NA,ncol=5,nrow=it)
  for(g in 1:it){
   siml[g,]-optm(perm=20)
  }
siml
}

running itn(10) I get a matrix with 10 rows and 5 columns.

So what do you mean by it does not work?
You sometimes get an error message from apply due to a bad dim() like this

Error in apply(mat, 2, mean) : dim(X) must have a positive length

When the result of mat[-delete,] is a matrix with a single row the result is 
simplified to a vector. See ?[ in particular the drop argument.
So this should work in function optm()

mat - mat[-delete,, drop=FALSE] 

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help on loops

2013-06-07 Thread Berend Hasselman



Please, please do not use html formatted mail.
It is clearly requested by the mailing list
The code of your last mail is a mess and when replying it becomes even more of 
a mess.

I told you to do 

siml[g,] - optm(perm=20)

See the comma after g.

and not what you are now doing with siml[g]-optm(perm=20)[g]

I'm giving up.

Berend


On 07-06-2013, at 12:15, Laz lmra...@ufl.edu wrote:

 Hi,
 
 I am almost getting there, but still have errors. Thanks for your help. I 
 have tried improving but I get the following errors below:
 
 
 
 
  
 itn-function(it){
 
 + 
 siml-matrix(NA,ncol=5,nrow=it)
 
 + 
 for(g in 1:it){
 
 + 
 siml[g]-optm(perm=20)[g]
 
 + 
 }
 
 + 
 siml
 
 + 
 }
 
  
 itn(3)
 
  [,1] [,2] [,3] [,4] [,5]
 [1,] 0.8873775898   NA   NA   NA   NA
 [2,] 0.0015584824   NA   NA   NA   NA
 [3,] 0.0001414317   NA   NA   NA   NA
 
 
 
  
 itn-function(it){
 
 + 
 siml-matrix(NA,ncol=5,nrow=it)
 
 + 
 for(g in 1:it){
 
 + 
 siml[g]-optm(perm=20)
 
 + 
 }
 
 + 
 siml
 
 + 
 }
 
  
 itn(3)
 
   [,1] [,2] [,3] [,4] [,5]
 [1,] 0.8880941   NA   NA   NA   NA
 [2,] 0.8869727   NA   NA   NA   NA
 [3,] 0.8877045   NA   NA   NA   NA
 
 Warning messages:
 
 1: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 2: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 3: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 
 I expect something close to
 
 
 average   sd  se   min   max
 0.8881969 0.0008215379 0.000410769 0.8873842 0.8890167
 
   0.884659 0.0004215379 0.000410769 0.2342 0.676307
 
   0.8885839 0.0001215379 0.0002112 0.82752992 0.8836337
 
 
 
 
   
 Thanks fpr you help.
 
 
 On 6/7/2013 5:24 AM, Berend Hasselman wrote:
 On 07-06-2013, at 10:59, Laz lmra...@ufl.edu
  wrote:
 
 
 Dear R users,
 
 I am stuck here: My first function returns a vector of 5 values.
 In my second function, I want to repeat this, a number of times, say 10 
 so that I have 10 rows and five columns but I keep on getting errors.
 
 See the code and results below:
 
 optm -function(perm, verbose = FALSE)
 {
   trace-c()
   for (k in 1:perm){
 trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1]
 perm[k]-k
 mat-cbind(trace, perm = seq(perm))
   }
   if (verbose){
 cat(***starting matrix\n)
 print(mat)
   }
   # iterate till done
   while(nrow(mat)  1){
 high - diff(mat[, 'trace'])  0
 if (!any(high)) break  # done
 # find which one to delete
 delete - which.max(high) + 1L
 mat - mat[-delete, ]
 newmat-apply(mat,2,mean)[1]
 sdm-sd(mat[,1])
 sem-sdm/sqrt(nrow(mat))
 maxv-mat[1,1]
 minv-mat[nrow(mat),1]
   }
   stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv)
   stats
 }
 
 
 test-optm(perm=20)
 test
 
 average   sd   se   min  max
 trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241
 
 
 itn-function(it){
 siml-matrix(NA,ncol=5,nrow=length(it))
   for(g in 1:it){
siml[g]-optm(perm=20)
   }
 siml-cbind(siml=siml)
 siml
 }
 
 
 ans-itn(5)
 
 Warning messages:
 1: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 2: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 3: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 4: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 5: In siml[g] - optm(perm = 20) :
   number of items to replace is not a multiple of replacement length
 
 ans
 
   [,1]  [,2]  [,3]  [,4]  [,5]
 [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469
 
 
 
 1. Not reproducible code. Where does function Rspatswap come from?
 
 2. You have several errors in function itn:
 Argument it is a scalar: length(it) is 1. You need to do siml - 
 matrix(NA,ncol=5,nrow=it)
 Next in the g-loop you want to fill row g so do: siml[g,] - …..
 
 Finally why are you doing siml - cbind(siml=siml)?
 Seems superfluous to me. Delete the line.
 
 Berend
 
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPlus script

2013-06-07 Thread Duncan Murdoch


On 13-06-06 6:22 PM, Rolf Turner wrote:

On 07/06/13 03:19, Scott Raynaud wrote:

I actually had tried placing arguments in the call but it didn't work.   
However, I did
not think about writing it to a variable and printing.  That seems to have done 
the
trick.  Funny, I don't remember having to do that before, but that's not 
surprising.



If I remember correctly --- haven't used Splus for decades --- this is a
difference
between Splus and R.

In R the output of a function is returned *invisibly* if that function
is called
from within another function.  And source() is one such other function.


Actually this depends on the caller.  source() does return its results 
invisibly, but many other functions don't.


source() is unusual in another way that came up recently (on R-devel, I 
think):  it calls withVisible() on the code that it evaluates, which 
means that instead of a simple value it will return a list containing 
the value and an indicator about whether it should be displayed. It 
returns this list invisibly, leaving it up to whoever called source() to 
decide whether to display the value or not.


Duncan Murdoch



So if you have a script, say melvin with the single line:

  sin(42)

and in R you execute

  source(melvin)

you will see no output.  If in another script, say clyde you have the
single line

  print(sin(42))

and in R you execute

  source(clyde)

you will see

  [1] -0.9165215

In Splus, IIRC, the print() call is unnecessary.  I.e. you would get the
same
result by sourcing melvin and clyde.

Current Splus users may correct me if I am wrong about this.

  cheers,

  Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add prediction interval forest plot (package meta)

2013-06-07 Thread Michael Dewey


At 19:34 06/06/2013, Bernd Weiss wrote:

Dear all,

I am struggling to add a prediction interval to a forest plot that was
created with forest.meta(), package meta.

I checked the source of forest.meta() and realized that it is heavily
relying on grid. I am lacking any experience with grid graphics. So, I
am having difficulties to find out where the random effects estimate is
actually plotted in order to add prediction intervals.


I appreciate this is not a direct answer to your question but the 
equivalent function in metafor (available from CRAN) does this and it 
might be easier to use that.




Any help is much appreciated!

Bernd


R version 3.0.1 Patched (2013-05-28 r62825)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=CLC_TIME=German_Germany.1252

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
  base

other attached packages:
[1] meta_2.3-0

loaded via a namespace (and not attached):
[1] tools_3.0.1


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] arima time series in R

2013-06-07 Thread Aakanksha Dahiya01


Hi

Could just anyone explain me the coefficients in the output of arima model

timeseriesarima - arima(series, order=c(1,1,2))
 timeseriesarima
Series: series
ARIMA(1,1,2)

Coefficients:
 ar1  ma1 ma2
  0.9744  -1.7695  0.7873
s.e.  0.0310   0.0481  0.0426

sigma^2 estimated as 337.4:  log likelihood=-1096.03
AIC=2200.07   AICc=2200.23   BIC=2214.2


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Add prediction interval forest plot (package meta)

2013-06-07 Thread Bernd Weiß


Am 07.06.2013 13:17, schrieb Michael Dewey:

At 19:34 06/06/2013, Bernd Weiss wrote:

Dear all,

I am struggling to add a prediction interval to a forest plot that was
created with forest.meta(), package meta.

I checked the source of forest.meta() and realized that it is heavily
relying on grid. I am lacking any experience with grid graphics. So, I
am having difficulties to find out where the random effects estimate is
actually plotted in order to add prediction intervals.


I appreciate this is not a direct answer to your question but the
equivalent function in metafor (available from CRAN) does this and it
might be easier to use that.


Thanks, Michael!

Yes, I vaguely remember that the metafor package can calculate the PIs 
but most of our analyses rely on the meta package. On the other hand, 
this might be a good reason to finally switch to metafor.


You probably refer to the function addpoly() which can be used for that. 
I will check it out.


Again, thanks for your help,

Bernd

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] arima time series in R

2013-06-07 Thread Prof Brian Ripley


On 07/06/2013 12:21, Aakanksha Dahiya01 wrote:


Hi

Could just anyone explain me the coefficients in the output of arima model


The person who wrote the help page already did, but that is hardly 'just 
anyone'.



timeseriesarima - arima(series, order=c(1,1,2))

timeseriesarima

Series: series
ARIMA(1,1,2)

Coefficients:
  ar1  ma1 ma2
   0.9744  -1.7695  0.7873
s.e.  0.0310   0.0481  0.0426

sigma^2 estimated as 337.4:  log likelihood=-1096.03
AIC=2200.07   AICc=2200.23   BIC=2214.2


That is not from arima in package stats, so you need to follow the 
posting guide to tell us whose wrapper it is and hence which help page 
to read.  (Possibly package TSA.)




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


That does mean you.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset with non logical rules

HI,
Try:
?split()

source(http://www.openintro.org/stat/data/cdc.R;)
 str(cdc)
#'data.frame':    2 obs. of  9 variables:
# $ genhlth : Factor w/ 5 levels excellent,very good,..: 3 3 3 3 2 2 2 2 3 
3 ...
# $ exerany : num  0 0 1 1 0 1 1 0 0 1 ...
# $ hlthplan: num  1 1 1 1 1 1 1 1 1 1 ...
# $ smoke100: num  0 1 1 0 0 0 0 0 1 0 ...
# $ height  : num  70 64 60 66 61 64 71 67 65 70 ...
# $ weight  : int  175 125 105 132 150 114 194 170 150 180 ...
# $ wtdesire: int  175 115 105 124 130 114 185 160 130 170 ...
# $ age : int  77 33 49 42 55 55 31 45 27 44 ...
# $ gender  : Factor w/ 2 levels m,f: 1 2 2 2 2 2 1 1 2 1 ...
cdc$genhlth- as.character(cdc$genhlth)
cdclst1- split(cdc,cdc$genhlth)
lapply(cdclst1,head,2)
#$excellent
# genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#11 excellent   1    1    1 69    186  175  46  m
#13 excellent   1    0    1 66    185  220  21  m
#
#$fair
#   genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#12    fair   1    1    1 69    168  148  62  m
#15    fair   1    0    0 69    170  170  23  m
#
#$good
#  genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#1    good   0    1    0 70    175  175  77  m
#2    good   0    1    1 64    125  115  33  f
#
#$poor
#   genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#53    poor   1    1    1 62    140  130  64  f
#79    poor   1    1    0 63    142  120  52  f

#$`very good`
#    genhlth exerany hlthplan smoke100 height weight wtdesire age gender
#5 very good   0    1    0 61    150  130  55  f
#6 very good   1    1    0 64    114  114  55  f


sapply(cdclst1,nrow)
#excellent  fair  good  poor very good 
# 4657  2019  5675   677  6972 

cdcGood-cdclst1[[good]]
  str(cdcGood)
#'data.frame':    5675 obs. of  9 variables:
# $ genhlth : chr  good good good good ...
# $ exerany : num  0 0 1 1 0 1 1 0 1 1 ...
# $ hlthplan: num  1 1 1 1 1 1 1 0 1 1 ...
# $ smoke100: num  0 1 1 0 1 0 1 1 1 1 ...
# $ height  : num  70 64 60 66 65 70 73 67 75 65 ...
# $ weight  : int  175 125 105 132 150 180 185 156 200 160 ...
# $ wtdesire: int  175 115 105 124 130 170 175 150 190 140 ...
# $ age : int  77 33 49 42 27 44 79 47 43 54 ...
# $ gender  : Factor w/ 2 levels m,f: 1 2 2 2 2 1 1 1 1 2 ...
 

A.K.


Hi I am trying to figure out how to subset a bunch of data. As an example I am 
using the cdc data from openintro.org. 

In the first column with the name genhlth there are various 
options that the persons could respond. For exmaple good very good 
and poor. Now what i would like to do is to seperate the data so that 
everyone who answered good are stored in one variable and everyone who 
answered poor are in another variable. 

Now I know i could just do subset(cdc, cdc$genhlth == poor) to
 get the poor, but would really like for a code that would seperate data
 into each group, regardless of what the text or the number of groups 
are. 

Can anyone give me a hint?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Evaluation of function names...

2013-06-07 Thread Bert Gunter

Folks:

Feel free to provide me a link or reference instead of an answer.

Preamble:

f - function() function(x)rnorm(x)
g - f()
g(3)
## [1] -0.4448492 -0.2379978 -0.4537394

## But
rnorm - function()1  ## nasty nasty
g(3)
## Error in rnorm(x) : unused argument(s) (x)

## of course f - function()function(x)stats:::rnorm(x)
## would fix this.

Question 1:
Suppose I defined f() as at top in a package namespace and exported it.
Would a user see this same error defining g() and with rnorm redefined
as above? (Assume stats is attached as usual).  I presume so, but ...

Question 2:
If the answer to Q1 is yes, (how) can this be avoided without using fully
qualified function names?

Again, a quick reference to relevant docs would suffice.

Thanks.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Get count by day for particular coulmn

HI,
Please dput() the example data.


Could anyone give a help ASAP ?

You have been posting for long time and your many posts show this ASAP. I 
know that you got comments to that and also advised to dput() the data.  
Formatting your data took some time.  Would it be better to follow the posting 
guidelines and then use ASAP?

dat1- read.table(text=
DATETIME,COL_A,COL_B,COL_C,COL_D
1/1/2007 0:01,0,3,0,0
1/1/2007 0:02,0,0,3,0
1/1/2007 0:03,0,3,0,0
1/2/2007 0,0,3,0,0
1/2/2007 0:01,0,3,4,0
1/2/2007 0:02,0,3,0,0
1/3/2007 0,0,0,0,0
1/3/2007 0:01,0,0,4,0
1/3/2007 0:02,0,3,0,0
,sep=,,header=TRUE,stringsAsFactors=FALSE)
dat1$DATETIME[!grepl(\\:,dat1$DATETIME)]-paste0(dat1$DATETIME[!grepl(\\:,dat1$DATETIME)],:00)
res-aggregate(.~gsub(\\s+.*,,dat1$DATETIME),data=dat1[,-1],function(x) 
length(x[x==3]))
 colnames(res)[1]- colnames(dat1)[1]
colnames(res)[-1]-paste0(COUNT(,colnames(res)[-1],=3))
 res
#  DATETIME COUNT(COL_A=3) COUNT(COL_B=3) COUNT(COL_C=3) COUNT(COL_D=3)
#1 1/1/2007  0  2  1  0
#2 1/2/2007  0  3  0  0
#3 1/3/2007  0  1  0  0
A.K.

   



here i have a dataframe 

for eg:- 
DATETIMECOL_A   COL_B   COL_C   COL_D 
1/1/2007 0:01   0   3   0   0 
1/1/2007 0:02   0   0   3   0 
1/1/2007 0:03   0   3   0   0 
...       .             ...             ...             
  
1/2/20070   0   3   0 
1/2/2007 0:01   0   3   4   0 
1/2/2007 0:02   0   3   0   0 
...       .             ...             ...             
  
1/3/20070   0   0   0 
1/3/2007 0:01   0   0   4   0 
1/3/2007 0:02   0   3   0   0 
...       .             ...             ...             
  

My requirement what is, i have to get the count for each day  where COL_B = 3 

For eg:- here i need to get like 
DATETIME           COUNT(COL_B=3) 
             
1/1/2007               2 
1/2/2007               3 
1/3/2007               1 

= 
= 

and this way i tried to get, 
MyDF[MyDF[DATETIME]==1/2/2007]  --- here this only select the row where 
DATETIME - column coming as 
1/2/2007 - date and not selecting other rows where same date is 
coming (eg:- 1/1/2007 0:01). And here i need to get the complete records
 for that particular day, when i give date without giving timestamp. 

- Could anyone give a help ASAP ? 

- Thanks 
Antony. 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matched samples, dataframe, panel data

Again my problem, better explained.



#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)



#My reproducible example

firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)-c(firm,year,industry,dummy,dimension)

firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3
colnames(data3)-c(firm,year,industry,dummy,dimension)

final1-rbind(data1,data2)
final2-rbind(final1,data3)
final2
final3-final2[order(final2$year,final2$industry,final2$dimension),]
final3





#So my data is final3 is like this:

   firm year industry dummy dimension
266 2000   20 0   781
1 1 2000   20 0  2120
215 2000   20 1  2189
368 2000   20 1  2765
164 2000   20 0  3178
317 2000   20 1  3245
113 2000   20 1  4532
6 2 2000   20 0  4890
419 2000   20 0  5438
46   10 2000   20 0  7690
2 1 2001   20 0   345
378 2001   20 1   389
327 2001   20 0  1234
174 2001   20 0  2678
7 2 2001   20 1  2789
225 2001   20 1  4289
47   10 2001   20 0  6022
123 2001   20 1  6593
276 2001   20 0 35489
429 2001   20 1 37824
60   14 2001   30 1  2341
54   12 2001   30 0  2345
57   13 2001   30 1  3245
51   11 2001   30 0 12456
63   15 2001   30 1 12900
78   19 2001   40 1   389
74   18 2001   40 1  1288
82   20 2001   40 0  6438
70   17 2001   40 1  6754
66   16 2001   40 0 23456
439 2002   20 023
337 2002   20 125
3 1 2002   20 1  2341
286 2002   20 0  2345
8 2 2002   20 1  3412
48   10 2002   20 1  3678
184 2002   20 0  
235 2002   20 0  8543
133 2002   20 0 12900
388 2002   20 1 23456
64   15 2002   30 0   123
52   11 2002   30 0   781
58   13 2002   30 1  2120
61   14 2002   30 1  5678
55   12 2002   30 0  5754
67   16 2002   40 0  1181
75   18 2002   40 1  1200
71   17 2002   40 0  8976
79   19 2002   40 0 23456
83   20 2002   40 1 24824
143 2003   20 0   123
245 2003   20 0   637
194 2003   20 1   647
347 2003   20 0  1200
398 2003   20 1  2367
449 2003   20 0  2897
4 1 2003   20 1  5678
296 2003   20 0  5754
49   10 2003   20 1  9431
9 2 2003   20 0  9500
59   13 2003   30 1   345
65   15 2003   30 1  2345
56   12 2003   30 0  8976
62   14 2003   30 1 10900
53   11 2003   30 0 32489
84   20 2003   40 023
76   18 2003   40 1  2345
80   19 2003   40 0  2367
72   17 2003   40 1  3245
68   16 2003   40 1 32489
153 2004   20 0  2345
357 2004   20 1  2345
50   10 2004   20 1  2890
459 2004   20 0  3456
408 2004   20 0  3892
102 2004   20 1  8765
306

Re: [R] Add prediction interval forest plot (package meta)

2013-06-07 Thread Michael Dewey


At 12:39 07/06/2013, Bernd Weiß wrote:

Am 07.06.2013 13:17, schrieb Michael Dewey:

At 19:34 06/06/2013, Bernd Weiss wrote:

Dear all,

I am struggling to add a prediction interval to a forest plot that was
created with forest.meta(), package meta.

I checked the source of forest.meta() and realized that it is heavily
relying on grid. I am lacking any experience with grid graphics. So, I
am having difficulties to find out where the random effects estimate is
actually plotted in order to add prediction intervals.


I appreciate this is not a direct answer to your question but the
equivalent function in metafor (available from CRAN) does this and it
might be easier to use that.


Thanks, Michael!

Yes, I vaguely remember that the metafor package 
can calculate the PIs but most of our analyses 
rely on the meta package. On the other hand, 
this might be a good reason to finally switch to metafor.


You probably refer to the function addpoly() 
which can be used for that. I will check it out.


I was thinking of the addcred parameter to 
forest.rma.uni in fact, although you could roll your own.




Again, thanks for your help,

Bernd


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with ODBC connection

2013-06-07 Thread Christofer Bogaso

Hello again,

I am having problem with ODBC connection using the RODBC package.

I am basically trying to read the attached Excel-2003 file using RODBC
package. Here is my code:

 head(sqlFetch(odbcConnectExcel(d:/1.xls), ), 30); odbcCloseAll()
   Criteria  s  d fd  ffd1
f1fd2f2 fd3 f3 F12 F13 F14 F15 F16 F17
F18 F19 F20
1 a NA NA NA NA 0.
0.27755576 -0.00040332321NA  NA NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA
2 s NA  0 NA NA 0.
0.  0.000NA  NA NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA
3 d NA  0 NA NA 0.01734723
0.06938894  0.2775558  5.00  NA NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA
4 f NA NA NA NA NA
NA NA -4.25  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
5 f NA  0 NA NA 0.
0.  0.000 -1.53  NA NA  NA
 NA  NA  NA  NA  NA  NA  NA  NA
6 f NA NA NA NA NA
NA  0.000  0.00  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
7 f NA NA NA NA NA
NA  0.000NA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
8 f NA  0 NA NA NA
NA NANA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
9 f NA  0 NA NA NA
NA NANA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
10f NA NA NA NA NA
NA NANA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
11f NA NA NA NA NA
NA NANA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
12f NA NA NA NA NA
NA NANA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA
13f NA NA NA NA NA
NA NANA  NA NA  NA  NA  NA  NA  NA  NA
 NA  NA  NA

Here you see the data in second column could not read at all.

Can somebody point me if I did something wrong?

Thanks and regards,
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Clogit R and Stata

2013-06-07 Thread Richard Beckett

Sorry to once again write a message but I'm once again stumped and am having no 
luck finding a solution anywhere else.


This question requires some finesse in both R and STATA so hopefully I will be 
able to get an answer here. I am much more adept in R and am trying to 
replicate the results of a STATA file in R. Hopefully this is a proper forum 
for such questions. 


This is the code for the clogit in STATA
clogit sftpcons sftptv2a3 sftptv2a4 sftptv2a5 sftptv2a2 sftptv2a6 logim maccat 
disp4cat if sample==1  glb_ind==Y, group(stratida)
and I tried to replicate it using
clogit1-clogit(sftpcons~sftptv2a3+sftptv2a4+sftptv2a5+sftptv2a2+sftptv2a6+logim+maccat+disp4cat+strata(stratida),
 dframe, sample==1 | glb_ind==Y)
but got different results
What did I do wrong here? I interpreted the STATA clogit as run this logit as 
long as the sample is 1 and glb_ind=Y What should I be doing instead?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bionconductor help

2013-06-07 Thread Payal Urs

Hi

I am trying to do some data analysis using R and Bioconductor.  I have a
function to read my data called ReadAffyData and a function to plot the
data called preqc.  I want to know if there is any way I can extract
information (read: Data) from the ReadAffyData function into preqc in order
to produce plots? I have not completely understood the inheritance in R and
I am getting errors similar to the one below. I am new to R and could do
with any help in this aspect!

ReadAffyData - function( filename ) {

require(affy)
require(annotate)

Cov - read.table( filename, sep=\t, header=1, quote=,
comment= )
if( ! all( c( Filename, Label, Repl, Trt ) %in% colnames( Cov )
) )
{
stop( Missing mandatory column )
}

Cov - Cov[order(Cov$Trt),]
i - table(Cov$Trt)
Cov$Repl - unlist( lapply( i, function( j ) 1:j ) )
Cov$Label - paste( as.character( Cov$Trt ), Cov$Repl, sep=: )
rownames(Cov) - as.character(Cov$Label)

tmp1 - colnames( Cov )
tmp2 - rep( , length( Cov ) )

for( i in 1:length( tmp2 ) )
{
tmp2[i] - paste( sort( unique( as.character( Cov[,i] ) ) ),
collapse=/ )
}

labelDescription - data.frame( labelDescription=tmp2 )

rownames( labelDescription ) - tmp1
tmp - new( AnnotatedDataFrame, data=Cov,
varMetadata=labelDescription)

Data - ReadAffy( sampleNames=as.character( Cov$Label
),phenoData=tmp, verbose=TRUE )
}

preQC - function(name){
ReadAffyData(name)
plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density,
main=Raw(PM))
}

preQC(cov.txt)
1 reading C:/CEL/GSM311471.CEL ...instantiating an AffyBatch (intensity a
506944x24 matrix)...done.
Reading in : C:/CEL/GSM311471.CEL
Reading in : C:/CEL/GSM311472.CEL
Reading in : C:/CEL/GSM311473.CEL
Reading in : C:/CEL/GSM311474.CEL
Reading in : C:/CEL/GSM311475.CEL
.
.
.
Error in pm(Data) :
  error in evaluating the argument 'object' in selecting a method for
function 'pm': Error: object 'Data' not found

sessionInf()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
 LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C   LC_TIME=English_India.1252

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
base

other attached packages:
[1] annotate_1.38.0  AnnotationDbi_1.22.6 affy_1.38.1
 Biobase_2.20.0   BiocGenerics_0.6.0

loaded via a namespace (and not attached):
 [1] affyio_1.28.0 BiocInstaller_1.10.1  DBI_0.2-7
IRanges_1.18.1preprocessCore_1.22.0
 [6] RSQLite_0.11.4stats4_3.0.1  tools_3.0.1
XML_3.96-1.1  xtable_1.7-1
[11] zlibbioc_1.6.0



Thanking you,
Ipsitha
Graduate Student
London

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cannot load pbdMPI package after compilation

2013-06-07 Thread Antoine Migeon

Hello,

I try to install pbdMPI.
Compilation successful, but load fails with segfault.

Is anyone can help me?

R version 3.0.0
pbdMPI version 0.1-6
Intel compiler version 13.1.1
OpenMPI version 1.6.4-1
CPU Intel x86_64

# R CMD INSTALL pbdMPI_0.1-6.tar.gz
..

checking for gcc... icc -std=gnu99
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether icc -std=gnu99 accepts -g... yes
checking for icc -std=gnu99 option to accept ISO C89... none needed
checking for mpirun... mpirun
checking for mpiexec... mpiexec
checking for orterun... orterun
checking for sed... /bin/sed
checking for mpicc... mpicc
checking for ompi_info... ompi_info
checking for mpich2version... F
found sed, mpicc, and ompi_info ...
 TMP_INC_DIRS = /opt/openmpi/1.6.4-1/intel-13.1.1/include
checking /opt/openmpi/1.6.4-1/intel-13.1.1/include ...
found /opt/openmpi/1.6.4-1/intel-13.1.1/include/mpi.h ...
 TMP_LIB_DIRS = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64
checking /opt/openmpi/1.6.4-1/intel-13.1.1/lib64 ...
found /opt/openmpi/1.6.4-1/intel-13.1.1/lib64/libmpi.so ...
found mpi.h and libmpi.so ...
 TMP_INC = /opt/openmpi/1.6.4-1/intel-13.1.1/include
 TMP_LIB = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64
checking for openpty in -lutil... yes
checking for main in -lpthread... yes

*** Results of pbdMPI package configure *

 TMP_INC = /opt/openmpi/1.6.4-1/intel-13.1.1/include
 TMP_LIB = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64
 MPI_ROOT =
 MPITYPE = OPENMPI
 MPI_INCLUDE_PATH = /opt/openmpi/1.6.4-1/intel-13.1.1/include
 MPI_LIBPATH = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64
 MPI_LIBS =  -lutil -lpthread
 MPI_DEFS = -DMPI2
 MPI_INCL2 =
 PKG_CPPFLAGS = -I/opt/openmpi/1.6.4-1/intel-13.1.1/include  -DMPI2
-DOPENMPI
 PKG_LIBS = -L/opt/openmpi/1.6.4-1/intel-13.1.1/lib64 -lmpi  -lutil
-lpthread
*
..
icc -std=gnu99 -I/usr/local/R/3.0.0/intel13/lib64/R/include -DNDEBUG
-I/opt/openmpi/1.6.4-1/intel-13.1.1/include  -DMPI2 -DOPENMPI -O3
-fp-model precise -pc 64 -axAVX-fpic  -O3 -fp-model precise  -pc 64
-axAVX  -c comm_errors.c -o comm_errors.o
icc -std=gnu99 -I/usr/local/R/3.0.0/intel13/lib64/R/include -DNDEBUG
-I/opt/openmpi/1.6.4-1/intel-13.1.1/include  -DMPI2 -DOPENMPI -O3
-fp-model precise -pc 64 -axAVX-fpic  -O3 -fp-model precise  -pc 64
-axAVX  -c comm_sort_double.c -o comm_sort_double.o
.
..

** testing if installed package can be loaded
sh: line 1:  2905 Segmentation fault 
'/usr/local/R/3.0.0/intel13/lib64/R/bin/R' --no-save --slave 21 
/tmp/RtmpGkncGK/file1e541c57190
ERROR: loading failed

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: .Call(spmd_initialize, PACKAGE = pbdMPI)
 2: fun(libname, pkgname)
 3: doTryCatch(return(expr), name, parentenv, handler)
 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5: tryCatchList(expr, classes, parentenv, handlers)
 6: tryCatch(fun(libname, pkgname), error = identity)
 7: runHook(.onLoad, env, package.lib, package)
 8: loadNamespace(package, c(which.lib.loc, lib.loc))
 9: doTryCatch(return(expr), name, parentenv, handler)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
11: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch(expr, error = function(e) {call - conditionCall(e)   
if (!is.null(call)) {if (identical(call[[1L]],
quote(doTryCatch))) call - sys.call(-4L)dcall -
deparse(call)[1L]prefix - paste(Error in, dcall, : )   
LONG - 75Lmsg - conditionMessage(e)sm - strsplit(msg,
\n)[[1L]]w - 14L + nchar(dcall, type = w) + nchar(sm[1L],
type = w)if (is.na(w)) w - 14L + nchar(dcall,
type = b) + nchar(sm[1L], type = b)if (w 
LONG) prefix - paste0(prefix, \n  )}else prefix
- Error : msg - paste0(prefix, conditionMessage(e), \n)   
.Internal(seterrmessage(msg[1L]))if (!silent 
identical(getOption(show.error.messages), TRUE)) {   
cat(msg, file = stderr()).Internal(printDeferredWarnings())   
}invisible(structure(msg, class = try-error, condition = e))})
13: try({ns - loadNamespace(package, c(which.lib.loc, lib.loc))   
env - attachNamespace(ns, pos = pos, deps)})
14: library(pkg_name, lib.loc = lib, character.only = TRUE,
logical.return = TRUE)
15: withCallingHandlers(expr, packageStartupMessage = function(c)
invokeRestart(muffleMessage))
16: suppressPackageStartupMessages(library(pkg_name, lib.loc = lib,
character.only = TRUE, logical.return = TRUE))
17: doTryCatch(return(expr), name, parentenv, handler)
18: tryCatchOne(expr, names, parentenv, handlers[[1L]])
19: tryCatchList(expr, classes, parentenv, handlers)
20: tryCatch(expr, error =

Re: [R] generating a bar chart with two axis for co-linear variable

2013-06-07 Thread Sudha Krishnan

Arun, 

Perfect, this is what I was looking for. 

Thanks,
Sudha Krishnan

-Original Message-
From: arun [mailto:smartpink...@yahoo.com] 
Sent: Friday, June 07, 2013 2:36 AM
To: Sudha Krishnan
Cc: R help
Subject: Re: [R] generating a bar chart with two axis for co-linear variable

HI,
Not sure if this is what you wanted.

pdf(BarplotsNew.pdf)
library(plotrix)
lst2-lapply(seq_len(ncol(dat1)),function(i){
                    Ctdat- table(dat1[,i])
                    Ctdat1-(Ctdat/sum(Ctdat))*100
                    dat2-data.frame(Ctdat,Ctdat1,stringsAsFactors=FALSE)[,-3]
                    colnames(dat2)[3]-Rel.Freq
                    dat2[,1]- as.numeric(as.character(dat2[,1]))
                    with(dat2,twoord.plot(Var1,Rel.Freq,Freq,
                        lylim=c(0,100),rylim=c(0,1000),
                        ylab=Relative Frequency,
                        rylab=Frequency,main=paste(Bar 
plot:,colnames(dat1)[i],sep= ),
                        
type=c(bar,l),lcol=2,rcol=4,xtickpos=Var1,xticklab=Var1))
                        })
 dev.off()

A.K.



- Original Message -
From: arun smartpink...@yahoo.com
To: Sudha Krishnan sudha.krish...@marlabs.com
Cc: R help r-help@r-project.org
Sent: Thursday, June 6, 2013 2:44 PM
Subject: Re: [R]  generating a bar chart with two axis for co-linear variable

HI,

May be this helps:
dat1- read.table(sampledata.txt,header=TRUE,sep=,,stringsAsFactors=FALSE)
pdf(Barplots.pdf)
 lst1-lapply(seq_len(ncol(dat1)),function(i) {Ctdat- 
table(dat1[,i]);Ctdat1-(Ctdat/sum(Ctdat))*100;barplot(Ctdat1,ylim=c(0,100),xlab=colnames(dat1)[i],ylab=Relative
 Frequency,main=paste(Barplot:,colnames(dat1)[i],sep= ))})
dev.off()
A.K.




- Original Message -
From: Sudha Krishnan sudha.krish...@marlabs.com
To: r-help@r-project.org r-help@r-project.org
Cc: 
Sent: Thursday, June 6, 2013 2:37 AM
Subject: [R]  generating a bar chart with two axis for co-linear variable



Hello Dimitris,



I was goggling for some help on Sensitivity vs 1-specificity and saw your link.



I hope you can be of help to me in one of the issue that I am facing in 
generating combo chart(bar chart and plot). I am a novice and have some 
difficulty in getting this logic correct.





I am give a dataset (I am attaching a sample dataset).



I am using a barplot() and passing values for percentage frequency and the 
corresponding variables. I am struck here, what my function does is only 
calculate the frequency for the listed variables and not the frequency 
percentage. Is there a method or a script with which I can pass the frequency 
percent and the related values as category columns for x axis?



I will attach the graphs that I have generated so that you can suggest the 
better way.



Sampledata - Sampledata.txt

What my function does to calculate the frequency with category names in X axis 
- 1.png

My requirement is to generate percentage frequency of the variable in y1 and 
not the frequency itself. 2.png (where x categories are missing)





Thanks,

Sudha Krishnan



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset with non logical rules

2013-06-07 Thread sisseck.net

aah perfect, exactly what i was looking for!

Thank you

On Fri, Jun 7, 2013 at 3:24 PM, arun kirshna [via R]
ml-node+s789695n4668923...@n4.nabble.com wrote:
 HI,
 Try:
 ?split()

 source(http://www.openintro.org/stat/data/cdc.R;)
  str(cdc)
 #'data.frame':2 obs. of  9 variables:
 # $ genhlth : Factor w/ 5 levels excellent,very good,..: 3 3 3 3 2 2 2 2
 3 3 ...
 # $ exerany : num  0 0 1 1 0 1 1 0 0 1 ...
 # $ hlthplan: num  1 1 1 1 1 1 1 1 1 1 ...
 # $ smoke100: num  0 1 1 0 0 0 0 0 1 0 ...
 # $ height  : num  70 64 60 66 61 64 71 67 65 70 ...
 # $ weight  : int  175 125 105 132 150 114 194 170 150 180 ...
 # $ wtdesire: int  175 115 105 124 130 114 185 160 130 170 ...
 # $ age : int  77 33 49 42 55 55 31 45 27 44 ...
 # $ gender  : Factor w/ 2 levels m,f: 1 2 2 2 2 2 1 1 2 1 ...
 cdc$genhlth- as.character(cdc$genhlth)
 cdclst1- split(cdc,cdc$genhlth)
 lapply(cdclst1,head,2)
 #$excellent
 # genhlth exerany hlthplan smoke100 height weight wtdesire age gender
 #11 excellent   111 69186  175  46  m
 #13 excellent   101 66185  220  21  m
 #
 #$fair
 #   genhlth exerany hlthplan smoke100 height weight wtdesire age gender
 #12fair   111 69168  148  62  m
 #15fair   100 69170  170  23  m
 #
 #$good
 #  genhlth exerany hlthplan smoke100 height weight wtdesire age gender
 #1good   010 70175  175  77  m
 #2good   011 64125  115  33  f
 #
 #$poor
 #   genhlth exerany hlthplan smoke100 height weight wtdesire age gender
 #53poor   111 62140  130  64  f
 #79poor   110 63142  120  52  f

 #$`very good`
 #genhlth exerany hlthplan smoke100 height weight wtdesire age gender
 #5 very good   010 61150  130  55  f
 #6 very good   110 64114  114  55  f


 sapply(cdclst1,nrow)
 #excellent  fair  good  poor very good
 # 4657  2019  5675   677  6972

 cdcGood-cdclst1[[good]]
   str(cdcGood)
 #'data.frame':5675 obs. of  9 variables:
 # $ genhlth : chr  good good good good ...
 # $ exerany : num  0 0 1 1 0 1 1 0 1 1 ...
 # $ hlthplan: num  1 1 1 1 1 1 1 0 1 1 ...
 # $ smoke100: num  0 1 1 0 1 0 1 1 1 1 ...
 # $ height  : num  70 64 60 66 65 70 73 67 75 65 ...
 # $ weight  : int  175 125 105 132 150 180 185 156 200 160 ...
 # $ wtdesire: int  175 115 105 124 130 170 175 150 190 140 ...
 # $ age : int  77 33 49 42 27 44 79 47 43 54 ...
 # $ gender  : Factor w/ 2 levels m,f: 1 2 2 2 2 1 1 1 1 2 ...


 A.K.


Hi I am trying to figure out how to subset a bunch of data. As an example I
 am using the cdc data from openintro.org.

In the first column with the name genhlth there are various
 options that the persons could respond. For exmaple good very good
 and poor. Now what i would like to do is to seperate the data so that
 everyone who answered good are stored in one variable and everyone who
 answered poor are in another variable.

Now I know i could just do subset(cdc, cdc$genhlth == poor) to
  get the poor, but would really like for a code that would seperate data
  into each group, regardless of what the text or the number of groups
 are.

Can anyone give me a hint?

 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 
 If you reply to this email, your message will be added to the discussion
 below:
 http://r.789695.n4.nabble.com/subset-with-non-logical-rules-tp4668906p4668923.html
 To unsubscribe from subset with non logical rules, click here.
 NAML




--
View this message in context: 
http://r.789695.n4.nabble.com/subset-with-non-logical-rules-tp4668906p4668924.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Evaluation of function names...

2013-06-07 Thread Prof Brian Ripley


On 07/06/2013 14:49, Bert Gunter wrote:

Folks:

Feel free to provide me a link or reference instead of an answer.

Preamble:

f - function() function(x)rnorm(x)
g - f()
g(3)
## [1] -0.4448492 -0.2379978 -0.4537394

## But
rnorm - function()1  ## nasty nasty
g(3)
## Error in rnorm(x) : unused argument(s) (x)

## of course f - function()function(x)stats:::rnorm(x)
## would fix this.

Question 1:
Suppose I defined f() as at top in a package namespace and exported it.
Would a user see this same error defining g() and with rnorm redefined
as above? (Assume stats is attached as usual).  I presume so, but ...


Not if done right.  The package should importFrom(stats, rnorm).


Question 2:
If the answer to Q1 is yes, (how) can this be avoided without using fully
qualified function names?

Again, a quick reference to relevant docs would suffice.


'Writing R Extensions' says

'The namespace controls the search strategy for variables used by 
functions in the package. If not found locally, R searches the package 
namespace first, then the imports, then the base namespace and then the 
normal search path.'


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Get count by day for particular coulmn

2013-06-07 Thread R_Antony

here i have a dataframe 

for eg:-
DATETIMECOL_A   COL_B   COL_C   COL_D
1/1/2007 0:01   0   3   0   0
1/1/2007 0:02   0   0   3   0
1/1/2007 0:03   0   3   0   0
...   . ... ... 

1/2/20070   0   3   0
1/2/2007 0:01   0   3   4   0
1/2/2007 0:02   0   3   0   0
...   . ... ... 

1/3/20070   0   0   0
1/3/2007 0:01   0   0   4   0
1/3/2007 0:02   0   3   0   0
...   . ... ... 


My requirement what is, i have to get the count for each day  where COL_B
= 3

For eg:- here i need to get like
DATETIME   COUNT(COL_B=3)

1/1/2007   2
1/2/2007   3
1/3/2007   1

=
=

and this way i tried to get,
MyDF[MyDF[DATETIME]==1/2/2007]  --- here this only select the row where
DATETIME - column coming as
1/2/2007 - date and not selecting other rows where same date is coming
(eg:- 1/1/2007 0:01). And here i need to get the complete records for that
particular day, when i give date without giving timestamp. 

- Could anyone give a help ASAP ?

- Thanks
Antony.





--
View this message in context: 
http://r.789695.n4.nabble.com/Get-count-by-day-for-particular-coulmn-tp4668915.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Evaluation of function names...

2013-06-07 Thread Bert Gunter

Thank you. That answers my question.

-- Bert

On Fri, Jun 7, 2013 at 7:09 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On 07/06/2013 14:49, Bert Gunter wrote:

 Folks:

 Feel free to provide me a link or reference instead of an answer.

 Preamble:

 f - function() function(x)rnorm(x)
 g - f()
 g(3)
 ## [1] -0.4448492 -0.2379978 -0.4537394

 ## But
 rnorm - function()1  ## nasty nasty
 g(3)
 ## Error in rnorm(x) : unused argument(s) (x)

 ## of course f - function()function(x)stats:::rnorm(x)
 ## would fix this.

 Question 1:
 Suppose I defined f() as at top in a package namespace and exported it.
 Would a user see this same error defining g() and with rnorm redefined
 as above? (Assume stats is attached as usual).  I presume so, but ...


 Not if done right.  The package should importFrom(stats, rnorm).

 Question 2:
 If the answer to Q1 is yes, (how) can this be avoided without using fully
 qualified function names?

 Again, a quick reference to relevant docs would suffice.


 'Writing R Extensions' says

 'The namespace controls the search strategy for variables used by functions
 in the package. If not found locally, R searches the package namespace
 first, then the imports, then the base namespace and then the normal search
 path.'

 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bionconductor help

Hi,

You would get better response if you post at Bioconductor mailing list.
http://www.bioconductor.org/help/mailing-list/

A.K.

- Original Message -
From: Payal Urs payal@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, June 7, 2013 6:12 AM
Subject: [R] Bionconductor help

Hi

I am trying to do some data analysis using R and Bioconductor.  I have a
function to read my data called ReadAffyData and a function to plot the
data called preqc.  I want to know if there is any way I can extract
information (read: Data) from the ReadAffyData function into preqc in order
to produce plots? I have not completely understood the inheritance in R and
I am getting errors similar to the one below. I am new to R and could do
with any help in this aspect!

ReadAffyData - function( filename ) {

        require(affy)
        require(annotate)

        Cov - read.table( filename, sep=\t, header=1, quote=,
comment= )
    if( ! all( c( Filename, Label, Repl, Trt ) %in% colnames( Cov )
) )
        {
        stop( Missing mandatory column )
    }

        Cov - Cov[order(Cov$Trt),]
        i - table(Cov$Trt)
        Cov$Repl - unlist( lapply( i, function( j ) 1:j ) )
        Cov$Label - paste( as.character( Cov$Trt ), Cov$Repl, sep=: )
        rownames(Cov) - as.character(Cov$Label)

        tmp1 - colnames( Cov )
    tmp2 - rep( , length( Cov ) )

        for( i in 1:length( tmp2 ) )
        {
        tmp2[i] - paste( sort( unique( as.character( Cov[,i] ) ) ),
collapse=/ )
    }

        labelDescription - data.frame( labelDescription=tmp2 )

    rownames( labelDescription ) - tmp1
    tmp - new( AnnotatedDataFrame, data=Cov,
varMetadata=labelDescription)

        Data - ReadAffy( sampleNames=as.character( Cov$Label
),phenoData=tmp, verbose=TRUE )
}

preQC - function(name){
ReadAffyData(name)
plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density,
main=Raw(PM))
}

preQC(cov.txt)
1 reading C:/CEL/GSM311471.CEL ...instantiating an AffyBatch (intensity a
506944x24 matrix)...done.
Reading in : C:/CEL/GSM311471.CEL
Reading in : C:/CEL/GSM311472.CEL
Reading in : C:/CEL/GSM311473.CEL
Reading in : C:/CEL/GSM311474.CEL
Reading in : C:/CEL/GSM311475.CEL
.
.
.
Error in pm(Data) :
  error in evaluating the argument 'object' in selecting a method for
function 'pm': Error: object 'Data' not found

sessionInf()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] annotate_1.38.0      AnnotationDbi_1.22.6 affy_1.38.1
Biobase_2.20.0       BiocGenerics_0.6.0

loaded via a namespace (and not attached):
[1] affyio_1.28.0         BiocInstaller_1.10.1  DBI_0.2-7
IRanges_1.18.1        preprocessCore_1.22.0
[6] RSQLite_0.11.4        stats4_3.0.1          tools_3.0.1
XML_3.96-1.1          xtable_1.7-1
[11] zlibbioc_1.6.0



Thanking you,
Ipsitha
Graduate Student
London

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Get count by day for particular coulmn

Hi,

On Fri, Jun 7, 2013 at 7:38 AM, R_Antony antony.akk...@ge.com wrote:
 here i have a dataframe

 for eg:-
 DATETIMECOL_A   COL_B   COL_C   COL_D
 1/1/2007 0:01   0   3   0   0
 1/1/2007 0:02   0   0   3   0
 1/1/2007 0:03   0   3   0   0
 ...   . ... ...
 
 1/2/20070   0   3   0
 1/2/2007 0:01   0   3   4   0
 1/2/2007 0:02   0   3   0   0
 ...   . ... ...
 
 1/3/20070   0   0   0
 1/3/2007 0:01   0   0   4   0
 1/3/2007 0:02   0   3   0   0
 ...   . ... ...
 

 My requirement what is, i have to get the count for each day  where COL_B
 = 3

 For eg:- here i need to get like
 DATETIME   COUNT(COL_B=3)
 
 1/1/2007   2
 1/2/2007   3
 1/3/2007   1

 =
 =

Since you didn't provide reproducible data (dput() is great for that),
here's an example with fake data:


MyDF - data.frame(DATETIME = c(1/1/2007, 1/1/2007, 1/1/2007,
1/2/2007, 1/2/2007, 1/2/2007, 1/3/2007, 1/3/2007,
1/3/2007),
COL_A = c(0, 0, 0, 1, 0, 1, 2, 3, 1),
COL_B = c(0, 3, 0, 3, 3, 1, 0, 1, 2),
COL_C = c(1, 2, 3, 1, 2, 3, 1, 2, 3), stringsAsFactors=FALSE)

aggregate(COL_B ~ DATETIME, data=MyDF, FUN=function(x)sum(x == 3))


 and this way i tried to get,
 MyDF[MyDF[DATETIME]==1/2/2007]  --- here this only select the row where
 DATETIME - column coming as
 1/2/2007 - date and not selecting other rows where same date is coming
 (eg:- 1/1/2007 0:01). And here i need to get the complete records for that
 particular day, when i give date without giving timestamp.

You omitted a comma:
MyDF[MyDF[DATETIME]==1/2/2007, ]


You might prefer:
subset(MyDF, DATETIME == 1/2/2007)

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] increase outlier size on boxplot

2013-06-07 Thread Shane Carey

Hi,

Does anybody know how to increase boxplot outlier symbol size? And also the
line from min/max to outlier

Thanks

-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] increase outlier size on boxplot

Hi Shane,

On Fri, Jun 7, 2013 at 10:36 AM, Shane Carey careys...@gmail.com wrote:
 Hi,

 Does anybody know how to increase boxplot outlier symbol size? And also the
 line from min/max to outlier

If you read the help for boxplot, it will send you to the help for bxp
for more information on custom plotting. ?bxp will offer you a vast
array of possibilities, including:

  whisklty, whisklwd, whiskcol: whisker line type (default:
  ‘dashed’), width, and color.

  outlty, outlwd, outpch, outcex, outcol, outbg: outlier line
  type, line width, point character, point size expansion,
  color, and background color.

accompanied by reproducible examples.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] increase outlier size on boxplot

2013-06-07 Thread Shane Carey

Great, thanks Sarah


On Fri, Jun 7, 2013 at 3:46 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 Hi Shane,

 On Fri, Jun 7, 2013 at 10:36 AM, Shane Carey careys...@gmail.com wrote:
  Hi,
 
  Does anybody know how to increase boxplot outlier symbol size? And also
 the
  line from min/max to outlier

 If you read the help for boxplot, it will send you to the help for bxp
 for more information on custom plotting. ?bxp will offer you a vast
 array of possibilities, including:

   whisklty, whisklwd, whiskcol: whisker line type (default:
   dashed), width, and color.

   outlty, outlwd, outpch, outcex, outcol, outbg: outlier line
   type, line width, point character, point size expansion,
   color, and background color.

 accompanied by reproducible examples.

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org




-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to get a subscript of a vector?

Hi,
?which()
 which(A==2)
#[1]  3  7 10
A.K.

Suppose there is a vector 
A - c(1,3,2,6,7,8,2,1,3,2). 

Now, I want to get the subscript of elements of A which equal 2. 

How can I do it ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ROC Curves for Neural Networks

2013-06-07 Thread Richard Beckett

Trying to plot an ROC curve to determine effectiveness of a neural network 
model in R. 

using the neuralnet package I created nn1a: 
nn1a-neuralnet(warstns ~ aigenlz+ 
p2l+anoclinv2+flinstab+mill+minconl+fllgdppclz+lpoplz+floil+flmtnestz+c_peaceyears,
 data=na.omit(dfram1), hidden=4) 

then using the lroc function from the epicalc package I tried lroc(nn1a) but 
came up with the error message 

Error in `colnames-`(`*tmp*`, value = c(Non-diseased, Diseased)) : 
  length of 'dimnames' [2] not equal to array extent 

Does anyone know any ways to get ROC curves to work with neural network models? 
Any other packages, model manipulation, etc? 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bionconductor help

2013-06-07 Thread Martin Morgan


On 06/07/2013 07:16 AM, arun wrote:

Hi,

You would get better response if you post at Bioconductor mailing list.
http://www.bioconductor.org/help/mailing-list/


Agreed, though in this case I think


ReadAffyData - function( filename ) {

...

Data - ReadAffy( sampleNames=as.character( Cov$Label),
 phenoData=tmp, verbose=TRUE )
}

preQC - function(name){
ReadAffyData(name)


what needs to happen here is that the return value needs to be assigned to a 
variable,


Data - ReadAffyData(name)

Martin




plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density,
main=Raw(PM))
}




A.K.

- Original Message -
From: Payal Urs payal@gmail.com
To: r-help@r-project.org
Cc:
Sent: Friday, June 7, 2013 6:12 AM
Subject: [R] Bionconductor help

Hi

I am trying to do some data analysis using R and Bioconductor.  I have a
function to read my data called ReadAffyData and a function to plot the
data called preqc.  I want to know if there is any way I can extract
information (read: Data) from the ReadAffyData function into preqc in order
to produce plots? I have not completely understood the inheritance in R and
I am getting errors similar to the one below. I am new to R and could do
with any help in this aspect!

ReadAffyData - function( filename ) {

...

   Data - ReadAffy( sampleNames=as.character( Cov$Label ),phenoData=tmp, 
verbose=TRUE )
}

preQC - function(name){
ReadAffyData(name)
plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density,
main=Raw(PM))
}

preQC(cov.txt)
1 reading C:/CEL/GSM311471.CEL ...instantiating an AffyBatch (intensity a
506944x24 matrix)...done.
Reading in : C:/CEL/GSM311471.CEL
Reading in : C:/CEL/GSM311472.CEL
Reading in : C:/CEL/GSM311473.CEL
Reading in : C:/CEL/GSM311474.CEL
Reading in : C:/CEL/GSM311475.CEL
.
.
.
Error in pm(Data) :
   error in evaluating the argument 'object' in selecting a method for
function 'pm': Error: object 'Data' not found

sessionInf()
R version 3.0.1 (2013-05-16)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C   LC_TIME=English_India.1252

attached base packages:
[1] parallel  stats graphics  grDevices utils datasets  methods
base

other attached packages:
[1] annotate_1.38.0  AnnotationDbi_1.22.6 affy_1.38.1
Biobase_2.20.0   BiocGenerics_0.6.0

loaded via a namespace (and not attached):
[1] affyio_1.28.0 BiocInstaller_1.10.1  DBI_0.2-7
IRanges_1.18.1preprocessCore_1.22.0
[6] RSQLite_0.11.4stats4_3.0.1  tools_3.0.1
XML_3.96-1.1  xtable_1.7-1
[11] zlibbioc_1.6.0



Thanking you,
Ipsitha
Graduate Student
London

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] glm.nb error

Dear R Community,

I have encountered a problem while using the R function glm.nb.
The code that produce the error was following two lines:

group=c(1,1,1,1,0,0,0,0)
fit=glm.nb(y~group)

While the y contains 8 sets of number like:
gene2750   1   0   0   1   5   1   0

Error message:

Error in while ((it - it + 1)  limit  abs(del)  eps) { :
  missing value where TRUE/FALSE needed
Calls: glm.nb - as.vector - theta.ml
In addition: There were 50 or more warnings (use warnings() to see the
first 50)
Execution halted


Information of my system:
 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

Does anyone happen to have some hit on how to solve this?
Appreciate for any response.

Thanks in advance,

Daofeng

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clogit R and Stata

2013-06-07 Thread peter dalgaard

On Jun 7, 2013, at 15:34 , Richard Beckett wrote:

Sorry to once again write a message but I'm once again stumped and am having
no luck finding a solution anywhere else.

This question requires some finesse in both R and STATA so hopefully I will
be able to get an answer here. I am much more adept in R and am trying to
replicate the results of a STATA file in R. Hopefully this is a proper forum
for such questions.

This is the code for the clogit in STATA
clogit sftpcons sftptv2a3 sftptv2a4 sftptv2a5 sftptv2a2 sftptv2a6 logim
maccat disp4cat if sample==1 glb_ind==Y, group(stratida)
and I tried to replicate it using
clogit1-clogit(sftpcons~sftptv2a3+sftptv2a4+sftptv2a5+sftptv2a2+sftptv2a6+logim+maccat+disp4cat+strata(stratida),
dframe, sample==1 | glb_ind==Y)
but got different results
What did I do wrong here? I interpreted the STATA clogit as run this logit as
long as the sample is 1 and glb_ind=Y What should I be doing instead?

An rather than | in the R version might help. Other than that, we're a
bit short on clues unless you provide some output.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb error

Hi,

On Fri, Jun 7, 2013 at 10:44 AM, Daofeng Li lid...@gmail.com wrote:
 Dear R Community,

 I have encountered a problem while using the R function glm.nb.
 The code that produce the error was following two lines:

 group=c(1,1,1,1,0,0,0,0)
 fit=glm.nb(y~group)

 While the y contains 8 sets of number like:
 gene2750   1   0   0   1   5   1   0

 Error message:

 Error in while ((it - it + 1)  limit  abs(del)  eps) { :
   missing value where TRUE/FALSE needed
 Calls: glm.nb - as.vector - theta.ml
 In addition: There were 50 or more warnings (use warnings() to see the
 first 50)
 Execution halted

I'd assume there is a missing value somewhere.

But we really need a reproducible example to be certain. You could use
dput(head(yourdata, 20)) to give us something to work with (as long as
that much of your data throws the error).

At the very least, we probably also need
str(y)
str(group)

Are you certain there are no missing values in your data? The problem
may not be that obvious, but that's an easy place to start.

Sarah


 Information of my system:
 sessionInfo()
 R version 3.0.1 (2013-05-16)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 Does anyone happen to have some hit on how to solve this?
 Appreciate for any response.

 Thanks in advance,

 Daofeng


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb error

2013-06-07 Thread Marc Schwartz


On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote:

 Dear R Community,
 
 I have encountered a problem while using the R function glm.nb.
 The code that produce the error was following two lines:
 
 group=c(1,1,1,1,0,0,0,0)
 fit=glm.nb(y~group)
 
 While the y contains 8 sets of number like:
 gene2750   1   0   0   1   5   1   0
 
 Error message:
 
 Error in while ((it - it + 1)  limit  abs(del)  eps) { :
  missing value where TRUE/FALSE needed
 Calls: glm.nb - as.vector - theta.ml
 In addition: There were 50 or more warnings (use warnings() to see the
 first 50)
 Execution halted
 
 
 Information of my system:
 sessionInfo()
 R version 3.0.1 (2013-05-16)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 
 locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 Does anyone happen to have some hit on how to solve this?
 Appreciate for any response.
 
 Thanks in advance,
 
 Daofeng


There is something wrong with your actual 'y' or 'group' that is not evident 
from the above info:


group - c(1, 1, 1, 1, 0, 0, 0, 0)
y - c(0, 1, 0, 0, 1, 5, 1, 0)

 require(MASS)
Loading required package: MASS

 glm.nb(y ~ group)

Call:  glm.nb(formula = y ~ group, init.theta = 1.711564307, link = log)

Coefficients:
(Intercept)group  
 0.5596  -1.9459  

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
Null Deviance:  10.23 
Residual Deviance: 6.848AIC: 25.25


Check str(y) and str(group)

You should also be sure to note in your posts when you are using a function 
from a non-base package, in this case MASS, which is not indicated in your 
sessionInfo() above, so something is amiss there as well.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
 lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
 #[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
 #  firm year industry dummy dimension
#1 1 2000   20 0  2120
#21    5 2000   20 1  2189
#
#$`2000.20`[[2]]
 #  firm year industry dummy dimension
#16    4 2000   20 0  3178
#31    7 2000   20 1  3245
#
#$`2000.20`[[3]]
 #  firm year industry dummy dimension
#11    3 2000   20 1  4532
#6 2 2000   20 0  4890
A.K.






From: Cecilia Carmo cecilia.ca...@ua.pt
To: r-help@r-project.org r-help@r-project.org 
Cc: smartpink...@yahoo.com smartpink...@yahoo.com 
Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data




Again my problem, better explained.

#I have a data panel of thousands of firms, by year and industry and 
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where 
#one firm in the first have a corresponding firm in the second with the same 
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)

#My reproducible example
firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)-c(firm,year,industry,dummy,dimension)
firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3
colnames(data3)-c(firm,year,industry,dummy,dimension)

final1-rbind(data1,data2)
final2-rbind(final1,data3)
final2
final3-final2[order(final2$year,final2$industry,final2$dimension),]
final3


#So my data is final3 is like this: 
   firm year industry dummy dimension
26    6 2000   20 0   781
1 1 2000   20 0  2120
21    5 2000   20 1  2189
36    8 2000   20 1  2765
16    4 2000   20 0  3178
31    7 2000   20 1  3245
11    3 2000   20 1  4532
6 2 2000   20 0  4890
41    9 2000   20 0  5438
46   10 2000   20 0  7690
2 1 2001   20 0   345
37    8 2001   20 1   389
32    7 2001   20 0  1234
17    4 2001   20 0  2678
7 2 2001   20 1  2789
22    5 2001   20 1  4289
47   10 2001   20 0  6022
12    3 2001   20 1  6593
27    6 2001   20 0 35489
42    9 2001   20 1 37824
60   14 2001   30 1  2341
54   12 2001   30 0  2345
57   13 2001   30 1  3245
51   11 2001   30 0 12456
63   15 2001   30 1 12900
78   19 2001   40 1   389
74   18 2001   40 1  1288
82   20 2001   40 0  6438
70   17 2001   40 1  6754
66   16 2001   40 0 23456
43    9 2002   20 0    23
33    7 2002   20 1    25
3 1 2002   20 1  2341
28    6 2002   20 0  2345
8 2 2002   20 1  3412
48   10 2002   20 1  3678
18    4 2002   20 0  
23    5 2002   20 0  8543
13    3 2002   20 0 12900
38    8 2002   20 1 23456
64   15 2002   30 0   123
52   11 2002   30 0   781
58   13 2002   30 1  2120
61   14 2002   30 1  5678
55   12 2002   30 0

[R] Horizontal labels on a plot

2013-06-07 Thread Shane Carey

Hi,

I have a plot with horizontal lables on the x-axis. For example:
Devonian volcanic rocks
n=2

with n=2 on a new line. How do I centre n=2 under the  Devonian volcanic
rocks  
label?

This was my code: text(axis_text, par(usr)[three], labels = paste(LABELS,
 \n ,  n =, t(t(name.count[,two]))), srt = txt_di,  adj = txtadj, xpd =
TRUE, cex=txt_cex)

where LABELS=Devonian volcanic rocks
and name.count[,two] =2


Thanks
-- 
Shane

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

 res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
 row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#1    1 2000   20 0  2120
#2    5 2000   20 1  2189
#3    4 2000   20 0  3178
#4    7 2000   20 1  3245


#or
 res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
 res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
 row.names(res)-1:nrow(res)
 res[1:4,]
#  firm year industry dummy dimension   group
#1    1 2000   20 0  2120 2000.20   #1 group
#2    5 2000   20 1  2189 2000.20   #1
#3    4 2000   20 0  3178 2000.20   #2
#4    7 2000   20 1  3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: 
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much. 
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1     1 2000       20     0      2120
#21    5 2000       20     1      2189
#
#$`2000.20`[[2]]
#  firm year industry dummy dimension
#16    4 2000       20     0      3178
#31    7 2000       20     1      3245
#
#$`2000.20`[[3]]
#  firm year industry dummy dimension
#11    3 2000       20     1      4532
#6     2 2000       20     0      4890
A.K.






From: Cecilia Carmo cecilia.ca...@ua.pt
To: r-help@r-project.org r-help@r-project.org
Cc: smartpink...@yahoo.com smartpink...@yahoo.com
Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data




Again my problem, better explained.

#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)

#My reproducible example
firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)-c(firm,year,industry,dummy,dimension)
firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3
colnames(data3)-c(firm,year,industry,dummy,dimension)

final1-rbind(data1,data2)
final2-rbind(final1,data3)
final2
final3-final2[order(final2$year,final2$industry,final2$dimension),]
final3


#So my data is final3 is like this:
   firm year industry dummy dimension
26    6 2000       20     0       781
1     1 2000       20     0      2120
21    5 2000       20     1      2189

Re: [R] matched samples, dataframe, panel data

That's it, the first one!

Thank you very much,

Cecília Carmo



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

 res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
 row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#11 2000   20 0  2120
#25 2000   20 1  2189
#34 2000   20 0  3178
#47 2000   20 1  3245


#or
 res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
 res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
 row.names(res)-1:nrow(res)
 res[1:4,]
#  firm year industry dummy dimension   group
#11 2000   20 0  2120 2000.20   #1 group
#25 2000   20 1  2189 2000.20   #1
#34 2000   20 0  3178 2000.20   #2
#47 2000   20 1  3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1 1 2000   20 0  2120
#215 2000   20 1  2189
#
#$`2000.20`[[2]]
#  firm year industry dummy dimension
#164 2000   20 0  3178
#317 2000   20 1  3245
#
#$`2000.20`[[3]]
#  firm year industry dummy dimension
#113 2000   20 1  4532
#6 2 2000   20 0  4890
A.K.






From: Cecilia Carmo cecilia.ca...@ua.pt
To: r-help@r-project.org r-help@r-project.org

Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data




Again my problem, better explained.

#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)

#My reproducible example
firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)-c(firm,year,industry,dummy,dimension)
firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3
colnames(data3)-c(firm,year,industry,dummy,dimension)

final1-rbind(data1,data2)
final2-rbind(final1,data3)
final2

Re: [R] glm.nb error

2013-06-07 Thread Marc Schwartz


On Jun 7, 2013, at 10:36 AM, Daofeng Li lid...@gmail.com wrote:

 Thank you Sarah and Marc for your fast and nice response.
 Apology for didn't include all information.
 
 I have a input file like following:
 
 gene1 18  15  13  13  16  9   20  24
 gene2 15  8   8   7   0   12  18  4
 gene3 10  9   8   12  9   11  12  12
 gene4 4   0   4   3   0   5   0   0
 gene5 0   1   0   0   1   5   1   0
 gene6 3   3   3   3   4   4   4   4
 gene7 0   4   0   2   2   2   0   0
 gene8 4   4   7   3   0   6   6   12
 gene9 11  6   13  10  13  7   12  9
 gene106   3   6   3   4   7   6   3
 
 I am using following R code to compare the difference between the 1st 4 
 numbers against 2nd 4 numbers:
 
 library(MASS)
 library(coin)
 suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
 library(qvalue)
 library(plyr)
 dat = 
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
 index=(apply(dat[,-1],1,sum)0)
 data = dat[index,]
 group=c(1,1,1,1,0,0,0,0)
 n=nrow(data)
 result=NULL
 for (i in 1:n){
   print(i)
   y=as.numeric(data[i,-1])
   if (all((y-mean(y))==0))
 result=rbind(result,c(0,0,0,1))
   else {
 fit=glm.nb(y~group)
 result=rbind(result,summary(fit)$coef[2,])
   }
 }
 qval = qvalue(result[,4])
 fdr=0.1
 index=(qval$qvaluesfdr)
 dat.result = data[index,]
 write.table(dat.result,file=test_result,row.names=F,quote=F)
 
 If you use this input file and code, would reproduce the same error:
 
 Loading required package: methods
 Loading required package: survival
 Loading required package: splines
 Loading required package: mvtnorm
 Loading required package: modeltools
 Loading required package: stats4
 
 Attaching package: ‘plyr’
 
 The following object is masked from ‘package:modeltools’:
 
 empty
 
 [1] 1
 [1] 2
 [1] 3
 [1] 4
 [1] 5
 [1] 6
 Error in while ((it - it + 1)  limit  abs(del)  eps) { : 
   missing value where TRUE/FALSE needed
 Calls: glm.nb - as.vector - theta.ml
 In addition: Warning messages:
 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace 
   :
   iteration limit reached
 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace 
   :
   iteration limit reached
 Execution halted
 
 So might be the error was in 6th line, not the line I pasted before (5th 
 line)? Sorry about that.
 
 Thanks.
 
 Daofeng


Your 'y' at that point in the loop is:
  
 y
[1] 3 3 3 3 4 4 4 4

and 'group' is:

 group
[1] 1 1 1 1 0 0 0 0


 glm.nb(y ~ group)
Error in while ((it - it + 1)  limit  abs(del)  eps) { : 
  missing value where TRUE/FALSE needed


Think about it...  :-)

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb error

Hi,

On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote:
 Thank you Sarah and Marc for your fast and nice response.
 Apology for didn't include all information.

 I have a input file like following:

 gene1 18 15 13 13 16 9 20 24
 gene2 15 8 8 7 0 12 18 4
 gene3 10 9 8 12 9 11 12 12
 gene4 4 0 4 3 0 5 0 0
 gene5 0 1 0 0 1 5 1 0
 gene6 3 3 3 3 4 4 4 4
 gene7 0 4 0 2 2 2 0 0
 gene8 4 4 7 3 0 6 6 12
 gene9 11 6 13 10 13 7 12 9
 gene10 6 3 6 3 4 7 6 3

 dat =
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))

Is this what's in the test file that your code reads in? We don't
have that file, so can't run your code.

If it is, then the output of

dput(dat)

would be enormously more useful than copy and pasting your data file
into your email.

And yes, the full code is very little like the pair of lines you
originally pasted in.

Sarah

 I am using following R code to compare the difference between the 1st 4
 numbers against 2nd 4 numbers:

 library(MASS)
 library(coin)
 suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
 library(qvalue)
 library(plyr)
 dat =
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
 index=(apply(dat[,-1],1,sum)0)
 data = dat[index,]
 group=c(1,1,1,1,0,0,0,0)
 n=nrow(data)
 result=NULL
 for (i in 1:n){
   print(i)
   y=as.numeric(data[i,-1])
   if (all((y-mean(y))==0))
 result=rbind(result,c(0,0,0,1))
   else {
 fit=glm.nb(y~group)
 result=rbind(result,summary(fit)$coef[2,])
   }
 }
 qval = qvalue(result[,4])
 fdr=0.1
 index=(qval$qvaluesfdr)
 dat.result = data[index,]
 write.table(dat.result,file=test_result,row.names=F,quote=F)

 If you use this input file and code, would reproduce the same error:

 Loading required package: methods
 Loading required package: survival
 Loading required package: splines
 Loading required package: mvtnorm
 Loading required package: modeltools
 Loading required package: stats4

 Attaching package: ‘plyr’

 The following object is masked from ‘package:modeltools’:

 empty

 [1] 1
 [1] 2
 [1] 3
 [1] 4
 [1] 5
 [1] 6
 Error in while ((it - it + 1)  limit  abs(del)  eps) { :
   missing value where TRUE/FALSE needed
 Calls: glm.nb - as.vector - theta.ml
 In addition: Warning messages:
 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
 control$trace   :
   iteration limit reached
 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
 control$trace   :
   iteration limit reached
 Execution halted

 So might be the error was in 6th line, not the line I pasted before (5th
 line)? Sorry about that.

 Thanks.

 Daofeng


 On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote:


 On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote:

  Dear R Community,
 
  I have encountered a problem while using the R function glm.nb.
  The code that produce the error was following two lines:
 
  group=c(1,1,1,1,0,0,0,0)
  fit=glm.nb(y~group)
 
  While the y contains 8 sets of number like:
  gene2750   1   0   0   1   5   1   0
 
  Error message:
 
  Error in while ((it - it + 1)  limit  abs(del)  eps) { :
   missing value where TRUE/FALSE needed
  Calls: glm.nb - as.vector - theta.ml
  In addition: There were 50 or more warnings (use warnings() to see the
  first 50)
  Execution halted
 
 
  Information of my system:
  sessionInfo()
  R version 3.0.1 (2013-05-16)
  Platform: x86_64-unknown-linux-gnu (64-bit)
 
  locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
  Does anyone happen to have some hit on how to solve this?
  Appreciate for any response.
 
  Thanks in advance,
 
  Daofeng


 There is something wrong with your actual 'y' or 'group' that is not
 evident from the above info:


 group - c(1, 1, 1, 1, 0, 0, 0, 0)
 y - c(0, 1, 0, 0, 1, 5, 1, 0)

  require(MASS)
 Loading required package: MASS

  glm.nb(y ~ group)

 Call:  glm.nb(formula = y ~ group, init.theta = 1.711564307, link = log)

 Coefficients:
 (Intercept)group
  0.5596  -1.9459

 Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
 Null Deviance:  10.23
 Residual Deviance: 6.848AIC: 25.25


 Check str(y) and str(group)

 You should also be sure to note in your posts when you are using a
 function from a non-base package, in this case MASS, which is not indicated
 in your sessionInfo() above, so something is amiss there as well.

 Regards,

 Marc Schwartz





-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] Conditional coloring of area between curves

2013-06-07 Thread William Dunlap

 that's exactly what I was looking for!

That code has some problems if your time series do not cross at the time
points.  E.g., Duncan's code with some diagnostic plotting is:

  f0 - function (site1, site2, x = time(site1), check = FALSE) 
  {
  stopifnot(length(x) == length(site1), length(x) == length(site2),  
all(diff(x)  0))
  if (check) {
  oldMfrow - par(mfrow = c(2, 1))
  on.exit(par(oldMfrow))
  matplot(x, cbind(site1, site2), type = b)
  }
  smaller - pmin(site1, site2)
  plot(x, site1, ylim = range(c(site1, site2)), type = n)
  polygon(c(x, rev(x)), c(smaller, rev(site1)), col = red)
  polygon(c(x, rev(x)), c(smaller, rev(site2)), col = blue)
  }
  # See how it messes the crossings in
  f0(c(0, 1, -1, 0), c(1, 0, -2, 1), check=TRUE)

The following is a quick and dirty way of adding all the crossing
points to the original data sets so the plot is right.
  addCrossingPoints - function (y1, y2, x) 
  {
  stopifnot(length(y1)==length(x), length(y2)==length(x), all(diff(x)0))
  # isCrossingSegment[i]==TRUE means that y1 and y2 cross between
  #x[i] and x[i+1]  (strictly, not at x[i] or x[i+1]).
  i - seq_len(length(y1) - 1)
  isCrossingSegment - ((y1[i]  y2[i])  (y1[i+1]  y2[i+1])) |
   ((y1[i]  y2[i])  (y1[i+1]  y2[i+1]))
  if (any(isCrossingSegment)) {
  i - which(isCrossingSegment)
  dx - x[i+1] - x[i]
  m1 - (y1[i+1] - y1[i]) / dx
  m2 - (y2[i+1] - y2[i]) / dx
  newDX - (y1[i] - y2[i]) / (m2 - m1)
  newY - y1[i] + newDX * m1 
  o - order( x - c(x, x[i] + newDX))
  x - x[o]
  y1 - c(y1, newY)[o]
  y2 - c(y2, newY)[o]
  }
  list(y1=y1, y2=y2, x=x)
  }
  # Combine it with Duncan's code to get a better looking plot
  f1 - function(site1, site2, x=time(site1), check = FALSE) {
  tmp - addCrossingPoints(site1, site2, x)
  f0(tmp$y1, tmp$y2, tmp$x, check=check)
  }
  f1(c(0, 1, -1, 0), c(1, 0, -2, 1), check=TRUE)


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Roland Pape
 Sent: Friday, June 07, 2013 2:51 AM
 To: Duncan Murdoch
 Cc: r-help@r-project.org
 Subject: Re: [R] Conditional coloring of area between curves
 
 Hi Duncan,
 
 that's exactly what I was looking for! The series have common times, so 
 that's no
 problem. Thanks a lot!
 
 Roland
 
 Von: Duncan Murdoch [murdoch.dun...@gmail.com]
 Gesendet: Donnerstag, 6. Juni 2013 18:45
 An: Roland Pape
 Cc: r-help@r-project.org
 Betreff: Re: [R] Conditional coloring of area between curves
 
 On 06/06/2013 10:41 AM, Roland Pape wrote:
  Dear list,
 
  I have two time series of temperatures from different sites plotted in the 
  same diagram
 and would like to color the area between the curves differently, dependent on 
 whether
 site 1 is cooler than site 2 (colored let's say in blue) or warmer (red). 
 While polygone()
 works fine to color the enclosed area in general, I'm struggling with this 
 conditional
 coloring. Any help is greatly appreciated!
 
 Suppose the series are named site1 and site2, and they have common
 times in a variable times.  Then the following should do what you want:
 
 smaller - pmin(site1, site2)
 plot(x, site1, ylim = range(c(site1, site2)), type=n)
 polygon(c(x, rev(x)), c(smaller, rev(site1)), col=red)
 polygon(c(x, rev(x)), c(smaller, rev(site2)), col=blue)
 
 If the times for the two series are different it's a little harder;
 first you need to give them common times, and that will depend on how
 you decide to evaluate the values between observations. Probably linear
 interpolation (using approx()) is fine, but it's up to you.
 
 Duncan Murdoch
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matched samples, dataframe, panel data

Sorry,

Something is not ok, because when  I do

 nrow(subset(res,res$dummy==0))
[1] 22
 nrow(subset(res,res$dummy==1))
[1] 24

There the number  o observatios in the two subsets is not equal.


Thanks again,

Cecília



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

 res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
 row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#11 2000   20 0  2120
#25 2000   20 1  2189
#34 2000   20 0  3178
#47 2000   20 1  3245


#or
 res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
 res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
 row.names(res)-1:nrow(res)
 res[1:4,]
#  firm year industry dummy dimension   group
#11 2000   20 0  2120 2000.20   #1 group
#25 2000   20 1  2189 2000.20   #1
#34 2000   20 0  3178 2000.20   #2
#47 2000   20 1  3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1 1 2000   20 0  2120
#215 2000   20 1  2189
#
#$`2000.20`[[2]]
#  firm year industry dummy dimension
#164 2000   20 0  3178
#317 2000   20 1  3245
#
#$`2000.20`[[3]]
#  firm year industry dummy dimension
#113 2000   20 1  4532
#6 2 2000   20 0  4890
A.K.






From: Cecilia Carmo cecilia.ca...@ua.pt
To: r-help@r-project.org r-help@r-project.org

Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data




Again my problem, better explained.

#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)

#My reproducible example
firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)
year2-rep(2001:2003,5)
industry2-rep(30,15)
dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345)
data2-data.frame(firm2,year2,industry2,dummy2,dimension2)
data2
colnames(data2)-c(firm,year,industry,dummy,dimension)
firm3-sort(rep(16:20,4),decreasing=F)
year3-rep(2001:2004,5)
industry3-rep(40,20)
dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0)
dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824,
23,2897)
data3-data.frame(firm3,year3,industry3,dummy3,dimension3)
data3

Re: [R] glm.nb error

As Marc already pointed out, take a close look at this part of your loop:

R i - 6
R
R y - as.numeric(data[i,-1])
R y
[1] 3 3 3 3 4 4 4 4
R group
[1] 1 1 1 1 0 0 0 0
R fit - glm.nb(y~group)
Error in while ((it - it + 1)  limit  abs(del)  eps) { :
  missing value where TRUE/FALSE needed

What do you expect to happen there?

In general, it's a better practice to pre-specify the size of result
(eg matrix(NA, nrow=n, ncol=4) ) and fill it as you go, rather than
using rbind() within a loop. Much more memory-efficient.

Sarah

On Fri, Jun 7, 2013 at 11:58 AM, Daofeng Li lid...@gmail.com wrote:
 Sorry Sarah.

 dput(dat)
 structure(list(gene = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L,
 9L, 10L, 2L), .Label = c(gene1, gene10, gene2, gene3,
 gene4, gene5, gene6, gene7, gene8, gene9), class = factor),
 b1 = c(18L, 15L, 10L, 4L, 0L, 3L, 0L, 4L, 11L, 6L), b2 = c(15L,
 8L, 9L, 0L, 1L, 3L, 4L, 4L, 6L, 3L), b3 = c(13L, 8L, 8L,
 4L, 0L, 3L, 0L, 7L, 13L, 6L), b4 = c(13L, 7L, 12L, 3L, 0L,
 3L, 2L, 3L, 10L, 3L), c1 = c(16L, 0L, 9L, 0L, 1L, 4L, 2L,
 0L, 13L, 4L), c2 = c(9L, 12L, 11L, 5L, 5L, 4L, 2L, 6L, 7L,
 7L), c3 = c(20L, 18L, 12L, 0L, 1L, 4L, 0L, 6L, 12L, 6L),
 c4 = c(24L, 4L, 12L, 0L, 0L, 4L, 0L, 12L, 9L, 3L)), .Names = c(gene,
 b1, b2, b3, b4, c1, c2, c3, c4), class = data.frame,
 row.names = c(NA,
 -10L))

 above was the dput(dat). Thanks.

 Daofeng


 On Fri, Jun 7, 2013 at 10:47 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:

 Hi,

 On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote:
  Thank you Sarah and Marc for your fast and nice response.
  Apology for didn't include all information.
 
  I have a input file like following:
 
  gene1 18 15 13 13 16 9 20 24
  gene2 15 8 8 7 0 12 18 4
  gene3 10 9 8 12 9 11 12 12
  gene4 4 0 4 3 0 5 0 0
  gene5 0 1 0 0 1 5 1 0
  gene6 3 3 3 3 4 4 4 4
  gene7 0 4 0 2 2 2 0 0
  gene8 4 4 7 3 0 6 6 12
  gene9 11 6 13 10 13 7 12 9
  gene10 6 3 6 3 4 7 6 3

  dat =
 
  read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))

 Is this what's in the test file that your code reads in? We don't
 have that file, so can't run your code.

 If it is, then the output of

 dput(dat)

 would be enormously more useful than copy and pasting your data file
 into your email.

 And yes, the full code is very little like the pair of lines you
 originally pasted in.

 Sarah

  I am using following R code to compare the difference between the 1st 4
  numbers against 2nd 4 numbers:
 
  library(MASS)
  library(coin)
  suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
  library(qvalue)
  library(plyr)
  dat =
 
  read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
  index=(apply(dat[,-1],1,sum)0)
  data = dat[index,]
  group=c(1,1,1,1,0,0,0,0)
  n=nrow(data)
  result=NULL
  for (i in 1:n){
print(i)
y=as.numeric(data[i,-1])
if (all((y-mean(y))==0))
  result=rbind(result,c(0,0,0,1))
else {
  fit=glm.nb(y~group)
  result=rbind(result,summary(fit)$coef[2,])
}
  }
  qval = qvalue(result[,4])
  fdr=0.1
  index=(qval$qvaluesfdr)
  dat.result = data[index,]
  write.table(dat.result,file=test_result,row.names=F,quote=F)
 
  If you use this input file and code, would reproduce the same error:
 
  Loading required package: methods
  Loading required package: survival
  Loading required package: splines
  Loading required package: mvtnorm
  Loading required package: modeltools
  Loading required package: stats4
 
  Attaching package: ‘plyr’
 
  The following object is masked from ‘package:modeltools’:
 
  empty
 
  [1] 1
  [1] 2
  [1] 3
  [1] 4
  [1] 5
  [1] 6
  Error in while ((it - it + 1)  limit  abs(del)  eps) { :
missing value where TRUE/FALSE needed
  Calls: glm.nb - as.vector - theta.ml
  In addition: Warning messages:
  1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
  control$trace   :
iteration limit reached
  2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
  control$trace   :
iteration limit reached
  Execution halted
 
  So might be the error was in 6th line, not the line I pasted before (5th
  line)? Sorry about that.
 
  Thanks.
 
  Daofeng
 
 
  On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com
  wrote:
 
 
  On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote:
 
   Dear R Community,
  
   I have encountered a problem while using the R function glm.nb.
   The code that produce the error was following two lines:
  
   group=c(1,1,1,1,0,0,0,0)
   fit=glm.nb(y~group)
  
   While the y contains 8 sets of number like:
   gene2750   1   0   0   1   5   1
   0
  
   Error message:
  
   Error in while ((it - it + 1)  limit  abs(del)  eps) { :
missing value where TRUE/FALSE needed
   Calls: glm.nb - as.vector - theta.ml
   In addition: There were 50 or more warnings (use warnings() to see
   the
   first 50)
   Execution halted
  
  
   Information of my system:
   sessionInfo()
   R version 3.0.1

Re: [R] matched samples, dataframe, panel data

Sorry, it included a case with both values for dummy were 1.

lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) 
y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]})

 res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x)))
 row.names(res)-1:nrow(res)
nrow(subset(res,dummy==0))
#[1] 22
 nrow(subset(res,dummy==1))
#[1] 22
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 12:03 PM
Subject: RE: [R] matched samples, dataframe, panel data

Sorry,

Something is not ok, because when  I do

 nrow(subset(res,res$dummy==0))
[1] 22
 nrow(subset(res,res$dummy==1))
[1] 24

There the number  o observatios in the two subsets is not equal.


Thanks again,

Cecília



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#1    1 2000       20     0      2120
#2    5 2000       20     1      2189
#3    4 2000       20     0      3178
#4    7 2000       20     1      3245


#or
res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
row.names(res)-1:nrow(res)
res[1:4,]
#  firm year industry dummy dimension   group
#1    1 2000       20     0      2120 2000.20   #1 group
#2    5 2000       20     1      2189 2000.20   #1
#3    4 2000       20     0      3178 2000.20   #2
#4    7 2000       20     1      3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1     1 2000       20     0      2120
#21    5 2000       20     1      2189
#
#$`2000.20`[[2]]
#  firm year industry dummy dimension
#16    4 2000       20     0      3178
#31    7 2000       20     1      3245
#
#$`2000.20`[[3]]
#  firm year industry dummy dimension
#11    3 2000       20     1      4532
#6     2 2000       20     0      4890
A.K.






From: Cecilia Carmo cecilia.ca...@ua.pt
To: r-help@r-project.org r-help@r-project.org
Cc: smartpink...@yahoo.com smartpink...@yahoo.com
Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data




Again my problem, better explained.

#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)

#My reproducible example
firm1-sort(rep(1:10,5),decreasing=F)
year1-rep(2000:2004,10)
industry1-rep(20,50)
dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1)
dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789,
2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824,
23,2897,3456,7690,6022,3678,9431,2890)
data1-data.frame(firm1,year1,industry1,dummy1,dimension1)
data1
colnames(data1)-c(firm,year,industry,dummy,dimension)

firm2-sort(rep(11:15,3),decreasing=F)

[R] coxme p-values

2013-06-07 Thread David Costantini

Hi there

does anybody know if the p-values in coxme are given one- or two-tailed by 
default?

Thank you
David


-Messaggio originale-
Da: r-help-boun...@r-project.org per conto di r-help-requ...@r-project.org
Inviato: gio 06/06/2013 11.00
A: r-help@r-project.org
Oggetto: R-help Digest, Vol 124, Issue 7
 
Send R-help mailing list submissions to
r-help@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
r-help-requ...@r-project.org

You can reach the person managing the list at
r-help-ow...@r-project.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of R-help digest...


Today's Topics:

   1. Re: How to display multiples lines with different color on
  the same plot? (Rui Barradas)
   2. Re: Post hoc power analysis for mixed-effects models
  (David Winsemius)
   3. Re: how to compute maximum of fitted polynomial? (David Winsemius)
   4. Re: how to compute maximum of fitted polynomial? (Hans W Borchers)
   5. Re: How to display multiples lines with different color on
  the   same plot? (David Winsemius)
   6. Re: read.csv and write.csv filtering for very big data ?
  (Duncan Murdoch)
   7. Re: Survival aareg problem (Terry Therneau)
   8. Re: Post hoc power analysis for mixed-effects models (Ben Bolker)
   9. SPlus script (Scott Raynaud)
  10. Re: dates and time series management (arun)
  11. Re: Loop FOR with histogram()  from lattice (Jeff Newmiller)
  12. Re: Lattice, ggplot, and pointsize (Deepayan Sarkar)
  13. Re: how to compute maximum of fitted polynomial? (Joseph Clark)
  14. Re: How to write a loop in R to select multiple regression
  model and validate it ? (beginner)
  15. Coefficients paths - comparison of ridge, lasso and elastic
  net regression (beginner)
  16. sample {base} (Xochitl CORMON)
  17. Re: split and common variables (Nico Met)
  18. Re: Trying to build up functions with its names by means of
  lapply (Julio Sergio)
  19. Re: read.csv and write.csv filtering for very big data ?
  (Duncan Murdoch)
  20. reshape2 issue (Neotropical bat risk assessments)
  21. Re: sample {base} (Sarah Goslee)
  22. Re: sample {base} (David Carlson)
  23. Re: SPlus script (Pascal Oettli)
  24. Re: reshape2 issue (Pascal Oettli)
  25. Re: split and common variables (arun)
  26. Re: sample {base} (Xochitl CORMON)
  27. reshape2 issue continued (Neotropical bat risk assessments)
  28. Re: dates and time series management (arun)
  29. Re: split and common variables (David Carlson)
  30. Re: reshape2 issue continued (Thomas Adams)
  31. Re: reshape2 issue continued (Ista Zahn)
  32. Re: Trying to build up functions with its names by means of
  lapply (Bert Gunter)
  33. Re: plyr _aply simplifying return-value dimensions (Adams, Jean)
  34. Re: sample {base} (arun)
  35. Re: SPlus script (Scott Raynaud)
  36. Re: reshape2 issue continued (John Kane)
  37. Re: SPlus script (peter dalgaard)
  38. Re: SPlus script (William Dunlap)
  39. reshape2 issue solved Tnx! (Neotropical bat risk assessments)
  40. Re: How to display multiples lines with different color on
  the same plot? (John Kane)
  41. Re: reshape2 issue solved Tnx! (John Kane)
  42. Re: Trying to build up functions with its names by means of
  lapply (Julio Sergio)
  43. Re: Trying to build up functions with its names by means  of
  lapply (William Dunlap)
  44. Re: Lattice, ggplot, and pointsize (Milan Bouchet-Valat)
  45. Re: SPlus script (Scott Raynaud)
  46. Re: Trying to build up functions with its names by means  of
  lapply (Julio Sergio)
  47. Re: probability weights in multilevel models (Thomas Lumley)
  48. Re: Trying to build up functions with its names bymeans   of
  lapply (William Dunlap)
  49. multilevel binary and ordered regression models (Xu Jun)
  50. Re: dates and time series management (arun)
  51. rJava is not loading (Dimitri Liakhovitski)
  52. Map Antarctica (ejb)
  53. List into table (Andtrei89)
  54. sandwich matrix in gamm (Isabel Proen?a)
  55.  R error- more columns than column names (Sandro Magalh?es)
  56. Subsetting out missing values for a certain variable
  (Daniel Tucker)
  57. Re: Subsetting out missing values for a certain variable
  (Daniel Tucker)
  58. Re: Subsetting out missing values for a certain variable
  (Daniel Tucker)
  59. Re: read.csv and write.csv filtering for very big data ?
  (ivo welch)
  60. reshape2 issue (Bruce Miller)
  61. Re: installing package 'rqpd' (Regression quantiles for panel
  data) (Manish K. Srivastava)
  62. Temporally autocorrelated Poisson data (Ali Swanson)
  63. Error in anova.cca by=axis after Vegan update
  (Edwards, Danielle)
  64. combining two different matrizes (ThomasH)
  65. Re: Subsetting out missing values for a certain variable
  (Jorge I Velez)
  66. Re: Map

[R] Trying to install NADA in R 3.0.0

2013-06-07 Thread David Doyle

Hello folks,

Im trying to install the NADA package in R 3.0.0


It has been archived so I tried to downloading it and installing it
locally.  I get

 utils:::menuInstallLocal()

package NADA successfully unpacked and MD5 sums checked


But then when I load it I get

 library(NADA)

Error in library(NADA) : NADA is not a valid installed package



Any sugestions

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distance-based non-parametric ANOVA

2013-06-07 Thread Mikhail Umorin

Yes. vegan and TreMineR. Thank you!

On Friday, June 07, 2013 10:52:02 Jose Iparraguirre wrote:
 Hi Mikhail,
 
 After a cursory online search using simply  A new method for non-parametric
 multivariate analysis + CRAN + R, I got the following three packages which
 include this paper in their references: vegan, mefa, and TraMineR. I don't
 know whether they deal with the procedure you are after, but it would be
 worth having a look. Regards,
 José
 
 Prof. José Iparraguirre
 Chief Economist
 Age UK
 
 Age UK
 Tavis House, 1- 6 Tavistock Square
 London, WC1H 9NB
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf Of Mikhail Umorin Sent: 06 June 2013 21:35
 To: r-help@r-project.org
 Subject: [R] Distance-based non-parametric ANOVA
 
 Hello,
 
 I am comparing treatments by comparing within group to between group
 distances like described in
 
 MJ Anderson. 2001. A new method for non-parametric multivariate analysis of
 variance. Austral Ecology 26: 32 -- 46.
 
 The idea is to find the ratio of within group sum of distance^2 to between
 group sum of distance^2. Then find the distribution of the statistic by
 permuting the sample.
 
 Has anyone done this before in R code? Are there any packages (I did checked
 on CRAN - none). Code?
 
 I want to make sure I won't be inventing a bicycle before I write my own
 package.
 
 Comments? Suggestions? (I have never written an R package before)
 
 Thank you for your time,
 
 Mikhail.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Age UK Improving later life
 
 www.ageuk.org.uk
 
 
 
 ---
 Age UK is a registered charity and company limited by guarantee, (registered
 charity number 1128267, registered company number 6825798). Registered
 office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.
 
 For the purposes of promoting Age UK Insurance, Age UK is an Appointed
 Representative of Age UK Enterprises Limited, Age UK is an Introducer
 Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth
 Access for the purposes of introducing potential annuity and health cash
 plans customers respectively.  Age UK Enterprises Limited, JLT Benefit
 Solutions Limited and Simplyhealth Access are all authorised and regulated
 by the Financial Services Authority.
 --
 
 This email and any files transmitted with it are confidential and intended
 solely for the use of the individual or entity to whom they are addressed.
 If you receive a message in error, please advise the sender and delete
 immediately.
 
 Except where this email is sent in the usual course of our business, any
 opinions expressed in this email are those of the author and do not
 necessarily reflect the opinions of Age UK or its subsidiaries and
 associated companies. Age UK monitors all e-mail transmissions passing
 through its network and may block or modify mails which are deemed to be
 unsuitable.
 
 Age Concern England (charity number 261794) and Help the Aged (charity
 number 272786) and their trading and other associated companies merged on
 1st April 2009.  Together they have formed the Age UK Group, dedicated to
 improving the lives of people in later life.  The three national Age
 Concerns in Scotland, Northern Ireland and Wales have also merged with Help
 the Aged in these nations to form three registered charities: Age Scotland,
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clogit R and Stata

2013-06-07 Thread Richard Beckett

 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Friday, June 7, 2013 11:12 AM
Subject: Re: [R] Clogit R and Stata

Here is the R output:

Call:
coxph(formula = Surv(rep(1, 1404L), sftpcons) ~ sftptv2a3 + sftptv2a4 +Â 
Â  Â  sftptv2a5 + sftptv2a2 + sftptv2a6 + logim + maccat + disp4cat +Â 
Â  Â  strata(stratida), data = dframe, method = exact)

Â  n= 1404, number of events= 351Â 

Â  Â  Â  Â  Â  Â  Â coef Â  Â  Â  Â  Â  exp(coef) se(coef) Â  Â  Â z Pr(|z|) Â 
 Â 
sftptv2a3 Â 1.4552 Â  Â 4.2852 Â  0.2273 Â 6.401 1.54e-10 ***
sftptv2a4 Â 3.1118 Â  22.4609 Â  0.2265 13.739 Â  2e-16 ***
sftptv2a5 Â 1.0717 Â  Â 2.9204 Â  0.2522 Â 4.249 2.15e-05 ***
sftptv2a2 Â 0.7185 Â  Â 2.0514 Â  0.3300 Â 2.177 Â  0.0295 * Â 
sftptv2a6 Â 2.7341 Â  15.3965 Â  0.5050 Â 5.414 6.17e-08 ***
logim Â  Â  Â 0.7579 Â  Â 2.1338 Â  0.1347 Â 5.625 1.85e-08 ***
maccat Â  Â  3.0809 Â  21.7771 Â  0.4005 Â 7.693 1.43e-14 ***
disp4cat Â  0.7061 Â  Â 2.0261 Â  0.1524 Â 4.634 3.59e-06 ***
---
Signif. codes: Â 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1Â 

Â  Â  Â  Â  Â  Â  Â  Â  Â  Â exp(coef) exp(-coef) lower .95 upper .95
sftptv2a3 Â  Â  4.285 Â  Â 0.23336 Â  Â  2.745 Â  Â  6.691
sftptv2a4 Â  Â 22.461 Â  Â 0.04452 Â  Â 14.409 Â  Â 35.013
sftptv2a5 Â  Â  2.920 Â  Â 0.34241 Â  Â  1.781 Â  Â  4.788
sftptv2a2 Â  Â  2.051 Â  Â 0.48747 Â  Â  1.074 Â  Â  3.917
sftptv2a6 Â  Â 15.397 Â  Â 0.06495 Â  Â  5.722 Â  Â 41.429
logim Â  Â  Â  Â  2.134 Â  Â 0.46866 Â  Â  1.639 Â  Â  2.779
maccat Â  Â  Â  21.777 Â  Â 0.04592 Â  Â  9.934 Â  Â 47.739
disp4cat Â  Â  Â 2.026 Â  Â 0.49355 Â  Â  1.503 Â  Â  2.731

Rsquare= 0.239 Â  (max possible= 0.623 )
Likelihood ratio test= 383.2 Â on 8 df, Â  p=0
Wald test Â  Â  Â  Â  Â  Â = 264.7 Â on 8 df, Â  p=0
Score (logrank) test = 396.2 Â on 8 df, Â  p=0

And the STATA output:

Iteration 0: Â  log likelihood = -95.537697 Â 
Iteration 1: Â  log likelihood = -91.465581 Â 
Iteration 2: Â  log likelihood = -91.402366 Â 
Iteration 3: Â  log likelihood = -91.402264 Â 
Iteration 4: Â  log likelihood = -91.402264 Â 

Conditional (fixed-effects) logistic regression Â  Number of obs Â  = Â  Â  Â  
Â 468
Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  LR 
chi2(8) Â  Â  Â = Â  Â  141.59
Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Prob 
 chi2 Â  Â  = Â  Â  0.
Log likelihood = -91.402264 Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Pseudo R2 Â  Â  Â  
= Â  Â  0.4365

--
Â  Â  sftpcons | Â  Â  Â Coef. Â  Std. Err. Â  Â  Â z Â  Â P|z| Â  Â  [95% 
Conf. Interval]
-+
Â  Â sftptv2a3 | Â  2.042827 Â  .4741327 Â  Â  4.31 Â  0.000 Â  Â  1.113544 Â  
Â  2.97211
Â  Â sftptv2a4 | Â  Â 4.10828 Â  .5593723 Â  Â  7.34 Â  0.000 Â  Â  Â 3.01193 Â 
 Â 5.204629
Â  Â sftptv2a5 | Â  1.766492 Â  .5585173 Â  Â  3.16 Â  0.002 Â  Â  .6718177 Â  
Â 2.861165
Â  Â sftptv2a2 | Â  1.366568 Â  .6540307 Â  Â  2.09 Â  0.037 Â  Â  Â .084691 Â  
Â 2.648444
Â  Â sftptv2a6 | Â  2.307152 Â  .8225835 Â  Â  2.80 Â  0.005 Â  Â  .6949178 Â  
Â 3.919386
Â  Â  Â  Â logim | Â  1.404135 Â  .3480976 Â  Â  4.03 Â  0.000 Â  Â  .7218764 Â 
 Â 2.086394
Â  Â  Â  maccat | Â  Â  2.8423 Â  .7008588 Â  Â  4.06 Â  0.000 Â  Â  1.468642 Â 
 Â 4.215958
Â  Â  disp4cat | Â  .6347805 Â  .2872258 Â  Â  2.21 Â  0.027 Â  Â  .0718283 Â  
Â 1.197733
--

Also tried changing method=approximate with no noticeable changeÂ 
On Jun 7, 2013, at 15:34 , Richard Beckett wrote:

 Sorry to once again write a message but I'm once again stumped and am having 
 no luck finding a solution anywhere else.

 This question requires some finesse in both R and STATA so hopefully I will 
 be able to get an answer here. I am much more adept in R and am trying to 
 replicate the results of a STATA file in R. Hopefully this is a proper forum 
 for such questions. 

 This is the code for the clogit in STATA
 clogit sftpcons sftptv2a3 sftptv2a4 sftptv2a5 sftptv2a2 sftptv2a6 logim 
 maccat disp4cat if sample==1  glb_ind==Y, group(stratida)
 and I tried to replicate it using
 clogit1-clogit(sftpcons~sftptv2a3+sftptv2a4+sftptv2a5+sftptv2a2+sftptv2a6+logim+maccat+disp4cat+strata(stratida),
  dframe, sample==1 | glb_ind==Y)
 but got different results
 What did I do wrong here? I interpreted the STATA clogit as run this logit as 
 long as the sample is 1 and glb_ind=Y What should I be doing instead?

An  rather than | in the R version might help. Other than that, we're a 
bit short on clues unless you provide some output.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dkÂ  Priv: pda...@gmail.com
[[alternative HTML version

Re: [R] Distance-based non-parametric ANOVA

2013-06-07 Thread Mikhail Umorin

I shall use this from now on. Thank you.


On Friday, June 07, 2013 19:19:55 Pascal Oettli wrote:


Hello,

You also can use the package sos

findFn('non-parametric multivariate
ANOVA')

Regards,Pascal


2013/6/7 Mikhail Umorin mike...@gmail.com[1]


Hello,

I am comparing treatments by comparing within group to between group distances 
likedescribed in

MJ Anderson. 2001. A new method for non-parametric multivariate analysis of 
variance.Austral Ecology 26: 32 -- 46.

The idea is to find the ratio of within group sum of distance^2 to between 
group sum 
ofdistance^2. Then find the distribution of the statistic by permuting the 
sample.

Has anyone done this before in R code? Are there any packages (I did checked 
onCRAN 
- none). Code?

I want to make sure I won't be inventing a bicycle before I write my own 
package.

Comments? Suggestions? (I have never written an R package before)

Thank you for your time,

Mikhail.

[[alternative HTML version deleted]]

__

R-help@r-project.org[2] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help[3]
http://www.R-project.org/posting-guide.html[4]







[1] mailto:mike...@gmail.com
[2] mailto:R-help@r-project.org
[3] https://stat.ethz.ch/mailman/listinfo/r-help
[4] http://www.R-project.org/posting-guide.html

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb error

Sorry Sarah.

 dput(dat)
structure(list(gene = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 2L), .Label = c(gene1, gene10, gene2, gene3,
gene4, gene5, gene6, gene7, gene8, gene9), class = factor),
b1 = c(18L, 15L, 10L, 4L, 0L, 3L, 0L, 4L, 11L, 6L), b2 = c(15L,
8L, 9L, 0L, 1L, 3L, 4L, 4L, 6L, 3L), b3 = c(13L, 8L, 8L,
4L, 0L, 3L, 0L, 7L, 13L, 6L), b4 = c(13L, 7L, 12L, 3L, 0L,
3L, 2L, 3L, 10L, 3L), c1 = c(16L, 0L, 9L, 0L, 1L, 4L, 2L,
0L, 13L, 4L), c2 = c(9L, 12L, 11L, 5L, 5L, 4L, 2L, 6L, 7L,
7L), c3 = c(20L, 18L, 12L, 0L, 1L, 4L, 0L, 6L, 12L, 6L),
c4 = c(24L, 4L, 12L, 0L, 0L, 4L, 0L, 12L, 9L, 3L)), .Names = c(gene,
b1, b2, b3, b4, c1, c2, c3, c4), class = data.frame,
row.names = c(NA,
-10L))

above was the dput(dat). Thanks.

Daofeng


On Fri, Jun 7, 2013 at 10:47 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi,

 On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote:
  Thank you Sarah and Marc for your fast and nice response.
  Apology for didn't include all information.
 
  I have a input file like following:
 
  gene1 18 15 13 13 16 9 20 24
  gene2 15 8 8 7 0 12 18 4
  gene3 10 9 8 12 9 11 12 12
  gene4 4 0 4 3 0 5 0 0
  gene5 0 1 0 0 1 5 1 0
  gene6 3 3 3 3 4 4 4 4
  gene7 0 4 0 2 2 2 0 0
  gene8 4 4 7 3 0 6 6 12
  gene9 11 6 13 10 13 7 12 9
  gene10 6 3 6 3 4 7 6 3

  dat =
 
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))

 Is this what's in the test file that your code reads in? We don't
 have that file, so can't run your code.

 If it is, then the output of

 dput(dat)

 would be enormously more useful than copy and pasting your data file
 into your email.

 And yes, the full code is very little like the pair of lines you
 originally pasted in.

 Sarah

  I am using following R code to compare the difference between the 1st 4
  numbers against 2nd 4 numbers:
 
  library(MASS)
  library(coin)
  suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
  library(qvalue)
  library(plyr)
  dat =
 
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
  index=(apply(dat[,-1],1,sum)0)
  data = dat[index,]
  group=c(1,1,1,1,0,0,0,0)
  n=nrow(data)
  result=NULL
  for (i in 1:n){
print(i)
y=as.numeric(data[i,-1])
if (all((y-mean(y))==0))
  result=rbind(result,c(0,0,0,1))
else {
  fit=glm.nb(y~group)
  result=rbind(result,summary(fit)$coef[2,])
}
  }
  qval = qvalue(result[,4])
  fdr=0.1
  index=(qval$qvaluesfdr)
  dat.result = data[index,]
  write.table(dat.result,file=test_result,row.names=F,quote=F)
 
  If you use this input file and code, would reproduce the same error:
 
  Loading required package: methods
  Loading required package: survival
  Loading required package: splines
  Loading required package: mvtnorm
  Loading required package: modeltools
  Loading required package: stats4
 
  Attaching package: plyr
 
  The following object is masked from package:modeltools:
 
  empty
 
  [1] 1
  [1] 2
  [1] 3
  [1] 4
  [1] 5
  [1] 6
  Error in while ((it - it + 1)  limit  abs(del)  eps) { :
missing value where TRUE/FALSE needed
  Calls: glm.nb - as.vector - theta.ml
  In addition: Warning messages:
  1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
  control$trace   :
iteration limit reached
  2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
  control$trace   :
iteration limit reached
  Execution halted
 
  So might be the error was in 6th line, not the line I pasted before (5th
  line)? Sorry about that.
 
  Thanks.
 
  Daofeng
 
 
  On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com
 wrote:
 
 
  On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote:
 
   Dear R Community,
  
   I have encountered a problem while using the R function glm.nb.
   The code that produce the error was following two lines:
  
   group=c(1,1,1,1,0,0,0,0)
   fit=glm.nb(y~group)
  
   While the y contains 8 sets of number like:
   gene2750   1   0   0   1   5   1
   0
  
   Error message:
  
   Error in while ((it - it + 1)  limit  abs(del)  eps) { :
missing value where TRUE/FALSE needed
   Calls: glm.nb - as.vector - theta.ml
   In addition: There were 50 or more warnings (use warnings() to see the
   first 50)
   Execution halted
  
  
   Information of my system:
   sessionInfo()
   R version 3.0.1 (2013-05-16)
   Platform: x86_64-unknown-linux-gnu (64-bit)
  
   locale:
   [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
   [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
   [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
   [7] LC_PAPER=C LC_NAME=C
   [9] LC_ADDRESS=C   LC_TELEPHONE=C
   [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
  
   attached base packages:
   [1] stats graphics  grDevices utils datasets  methods   base
  
   Does anyone happen to have some hit on how to solve this?
   Appreciate for any response.
  
   Thanks in advance,
  
   Daofeng

Re: [R] glm.nb error

Thank you Sarah and Marc for your fast and nice response.
Apology for didn't include all information.

I have a input file like following:

gene1 18 15 13 13 16 9 20 24
gene2 15 8 8 7 0 12 18 4
gene3 10 9 8 12 9 11 12 12
gene4 4 0 4 3 0 5 0 0
gene5 0 1 0 0 1 5 1 0
gene6 3 3 3 3 4 4 4 4
gene7 0 4 0 2 2 2 0 0
gene8 4 4 7 3 0 6 6 12
gene9 11 6 13 10 13 7 12 9
gene10 6 3 6 3 4 7 6 3

I am using following R code to compare the difference between the 1st 4
numbers against 2nd 4 numbers:

library(MASS)
library(coin)
suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
library(qvalue)
library(plyr)
dat =
read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
index=(apply(dat[,-1],1,sum)0)
data = dat[index,]
group=c(1,1,1,1,0,0,0,0)
n=nrow(data)
result=NULL
for (i in 1:n){
  print(i)
  y=as.numeric(data[i,-1])
  if (all((y-mean(y))==0))
result=rbind(result,c(0,0,0,1))
  else {
fit=glm.nb(y~group)
result=rbind(result,summary(fit)$coef[2,])
  }
}
qval = qvalue(result[,4])
fdr=0.1
index=(qval$qvaluesfdr)
dat.result = data[index,]
write.table(dat.result,file=test_result,row.names=F,quote=F)

If you use this input file and code, would reproduce the same error:

Loading required package: methods
Loading required package: survival
Loading required package: splines
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4

Attaching package: plyr

The following object is masked from package:modeltools:

empty

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
Error in while ((it - it + 1)  limit  abs(del)  eps) { :
  missing value where TRUE/FALSE needed
Calls: glm.nb - as.vector - theta.ml
In addition: Warning messages:
1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
control$trace   :
  iteration limit reached
2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
control$trace   :
  iteration limit reached
Execution halted

So might be the error was in 6th line, not the line I pasted before (5th
line)? Sorry about that.

Thanks.

Daofeng


On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote:


 On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote:

  Dear R Community,
 
  I have encountered a problem while using the R function glm.nb.
  The code that produce the error was following two lines:
 
  group=c(1,1,1,1,0,0,0,0)
  fit=glm.nb(y~group)
 
  While the y contains 8 sets of number like:
  gene2750   1   0   0   1   5   1   0
 
  Error message:
 
  Error in while ((it - it + 1)  limit  abs(del)  eps) { :
   missing value where TRUE/FALSE needed
  Calls: glm.nb - as.vector - theta.ml
  In addition: There were 50 or more warnings (use warnings() to see the
  first 50)
  Execution halted
 
 
  Information of my system:
  sessionInfo()
  R version 3.0.1 (2013-05-16)
  Platform: x86_64-unknown-linux-gnu (64-bit)
 
  locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
  attached base packages:
  [1] stats graphics  grDevices utils datasets  methods   base
 
  Does anyone happen to have some hit on how to solve this?
  Appreciate for any response.
 
  Thanks in advance,
 
  Daofeng


 There is something wrong with your actual 'y' or 'group' that is not
 evident from the above info:


 group - c(1, 1, 1, 1, 0, 0, 0, 0)
 y - c(0, 1, 0, 0, 1, 5, 1, 0)

  require(MASS)
 Loading required package: MASS

  glm.nb(y ~ group)

 Call:  glm.nb(formula = y ~ group, init.theta = 1.711564307, link = log)

 Coefficients:
 (Intercept)group
  0.5596  -1.9459

 Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
 Null Deviance:  10.23
 Residual Deviance: 6.848AIC: 25.25


 Check str(y) and str(group)

 You should also be sure to note in your posts when you are using a
 function from a non-base package, in this case MASS, which is not indicated
 in your sessionInfo() above, so something is amiss there as well.

 Regards,

 Marc Schwartz



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SQL queries in R

2013-06-07 Thread Sneha Bishnoi

Hey all!

I am trying to select a bunch of id's  (data type -character) from a table
and store them in a variable in R
But when i do this, it automatically truncates the leading zero's in id's
even though they are of character type.

code is :-

myconn-odbcConnect(testdata)
sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,'
and COMMUNITY=',property,'),sep=)
 unit_ids-sqlQuery(myconn,sql.select)
print(unit_ids)
is there anyway i can retain the UNIT_ID's as they are.

Thanks!
Sneha

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb error

Thanks Marc...still didn't get it...
Tried this...

 str(group)
 num [1:8] 1 1 1 1 0 0 0 0
 fit=glm.nb(y~group)
Error in while ((it - it + 1)  limit  abs(del)  eps) { :
  missing value where TRUE/FALSE needed
 group - c('1','1','1','1','0','0','0','0')
 str(group)
 chr [1:8] 1 1 1 1 0 0 0 0
 fit=glm.nb(y~group)
Error in while ((it - it + 1)  limit  abs(del)  eps) { :
  missing value where TRUE/FALSE needed


Daofeng


On Fri, Jun 7, 2013 at 10:48 AM, Marc Schwartz marc_schwa...@me.com wrote:


 On Jun 7, 2013, at 10:36 AM, Daofeng Li lid...@gmail.com wrote:

  Thank you Sarah and Marc for your fast and nice response.
  Apology for didn't include all information.
 
  I have a input file like following:
 
  gene1 18  15  13  13  16  9   20  24
  gene2 15  8   8   7   0   12  18  4
  gene3 10  9   8   12  9   11  12  12
  gene4 4   0   4   3   0   5   0   0
  gene5 0   1   0   0   1   5   1   0
  gene6 3   3   3   3   4   4   4   4
  gene7 0   4   0   2   2   2   0   0
  gene8 4   4   7   3   0   6   6   12
  gene9 11  6   13  10  13  7   12  9
  gene106   3   6   3   4   7   6   3
 
  I am using following R code to compare the difference between the 1st 4
 numbers against 2nd 4 numbers:
 
  library(MASS)
  library(coin)
  suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
  library(qvalue)
  library(plyr)
  dat =
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
  index=(apply(dat[,-1],1,sum)0)
  data = dat[index,]
  group=c(1,1,1,1,0,0,0,0)
  n=nrow(data)
  result=NULL
  for (i in 1:n){
print(i)
y=as.numeric(data[i,-1])
if (all((y-mean(y))==0))
  result=rbind(result,c(0,0,0,1))
else {
  fit=glm.nb(y~group)
  result=rbind(result,summary(fit)$coef[2,])
}
  }
  qval = qvalue(result[,4])
  fdr=0.1
  index=(qval$qvaluesfdr)
  dat.result = data[index,]
  write.table(dat.result,file=test_result,row.names=F,quote=F)
 
  If you use this input file and code, would reproduce the same error:
 
  Loading required package: methods
  Loading required package: survival
  Loading required package: splines
  Loading required package: mvtnorm
  Loading required package: modeltools
  Loading required package: stats4
 
  Attaching package: plyr
 
  The following object is masked from package:modeltools:
 
  empty
 
  [1] 1
  [1] 2
  [1] 3
  [1] 4
  [1] 5
  [1] 6
  Error in while ((it - it + 1)  limit  abs(del)  eps) { :
missing value where TRUE/FALSE needed
  Calls: glm.nb - as.vector - theta.ml
  In addition: Warning messages:
  1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
 control$trace   :
iteration limit reached
  2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
 control$trace   :
iteration limit reached
  Execution halted
 
  So might be the error was in 6th line, not the line I pasted before (5th
 line)? Sorry about that.
 
  Thanks.
 
  Daofeng


 Your 'y' at that point in the loop is:

  y
 [1] 3 3 3 3 4 4 4 4

 and 'group' is:

  group
 [1] 1 1 1 1 0 0 0 0


  glm.nb(y ~ group)
 Error in while ((it - it + 1)  limit  abs(del)  eps) { :
   missing value where TRUE/FALSE needed


 Think about it...  :-)

 Regards,

 Marc Schwartz



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distance-based non-parametric ANOVA

2013-06-07 Thread Mikhail Umorin

Thank you for the reference. Very helpful

Mikhail

On Friday, June 07, 2013 12:06:22 R Heberto Ghezzo, Dr wrote:
 You better see the original book
 Permutation methods - Mielke Jr Paul J, Berry Kenneth J. The programs in
 Fortran for the tests are in www.stat.colostate.edu/permute
 
 R.Heberto Ghezzo Ph.D.
 Montreal - Canada
 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf
 of Mikhail Umorin [mike...@gmail.com] Sent: Thursday, June 06, 2013 4:34 PM
 To: r-help@r-project.org
 Subject: [R] Distance-based non-parametric ANOVA
 
 Hello,
 
 I am comparing treatments by comparing within group to between group
 distances like described in
 
 MJ Anderson. 2001. A new method for non-parametric multivariate analysis of
 variance. Austral Ecology 26: 32 -- 46.
 
 The idea is to find the ratio of within group sum of distance^2 to between
 group sum of distance^2. Then find the distribution of the statistic by
 permuting the sample.
 
 Has anyone done this before in R code? Are there any packages (I did checked
 on CRAN - none). Code?
 
 I want to make sure I won't be inventing a bicycle before I write my own
 package.
 
 Comments? Suggestions? (I have never written an R package before)
 
 Thank you for your time,
 
 Mikhail.
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm.nb error

Thanks for all your help. I will try to fix this.
Very helpful.

Best,

Daofeng


On Fri, Jun 7, 2013 at 11:25 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 As Marc already pointed out, take a close look at this part of your loop:

 R i - 6
 R
 R y - as.numeric(data[i,-1])
 R y
 [1] 3 3 3 3 4 4 4 4
 R group
 [1] 1 1 1 1 0 0 0 0
 R fit - glm.nb(y~group)
 Error in while ((it - it + 1)  limit  abs(del)  eps) { :
   missing value where TRUE/FALSE needed

 What do you expect to happen there?

 In general, it's a better practice to pre-specify the size of result
 (eg matrix(NA, nrow=n, ncol=4) ) and fill it as you go, rather than
 using rbind() within a loop. Much more memory-efficient.

 Sarah

 On Fri, Jun 7, 2013 at 11:58 AM, Daofeng Li lid...@gmail.com wrote:
  Sorry Sarah.
 
  dput(dat)
  structure(list(gene = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L,
  9L, 10L, 2L), .Label = c(gene1, gene10, gene2, gene3,
  gene4, gene5, gene6, gene7, gene8, gene9), class = factor),
  b1 = c(18L, 15L, 10L, 4L, 0L, 3L, 0L, 4L, 11L, 6L), b2 = c(15L,
  8L, 9L, 0L, 1L, 3L, 4L, 4L, 6L, 3L), b3 = c(13L, 8L, 8L,
  4L, 0L, 3L, 0L, 7L, 13L, 6L), b4 = c(13L, 7L, 12L, 3L, 0L,
  3L, 2L, 3L, 10L, 3L), c1 = c(16L, 0L, 9L, 0L, 1L, 4L, 2L,
  0L, 13L, 4L), c2 = c(9L, 12L, 11L, 5L, 5L, 4L, 2L, 6L, 7L,
  7L), c3 = c(20L, 18L, 12L, 0L, 1L, 4L, 0L, 6L, 12L, 6L),
  c4 = c(24L, 4L, 12L, 0L, 0L, 4L, 0L, 12L, 9L, 3L)), .Names =
 c(gene,
  b1, b2, b3, b4, c1, c2, c3, c4), class = data.frame,
  row.names = c(NA,
  -10L))
 
  above was the dput(dat). Thanks.
 
  Daofeng
 
 
  On Fri, Jun 7, 2013 at 10:47 AM, Sarah Goslee sarah.gos...@gmail.com
  wrote:
 
  Hi,
 
  On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote:
   Thank you Sarah and Marc for your fast and nice response.
   Apology for didn't include all information.
  
   I have a input file like following:
  
   gene1 18 15 13 13 16 9 20 24
   gene2 15 8 8 7 0 12 18 4
   gene3 10 9 8 12 9 11 12 12
   gene4 4 0 4 3 0 5 0 0
   gene5 0 1 0 0 1 5 1 0
   gene6 3 3 3 3 4 4 4 4
   gene7 0 4 0 2 2 2 0 0
   gene8 4 4 7 3 0 6 6 12
   gene9 11 6 13 10 13 7 12 9
   gene10 6 3 6 3 4 7 6 3
 
   dat =
  
  
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
 
  Is this what's in the test file that your code reads in? We don't
  have that file, so can't run your code.
 
  If it is, then the output of
 
  dput(dat)
 
  would be enormously more useful than copy and pasting your data file
  into your email.
 
  And yes, the full code is very little like the pair of lines you
  originally pasted in.
 
  Sarah
 
   I am using following R code to compare the difference between the 1st
 4
   numbers against 2nd 4 numbers:
  
   library(MASS)
   library(coin)
   suppressPackageStartupMessages(suppressWarnings(library(tcltk)))
   library(qvalue)
   library(plyr)
   dat =
  
  
 read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4))
   index=(apply(dat[,-1],1,sum)0)
   data = dat[index,]
   group=c(1,1,1,1,0,0,0,0)
   n=nrow(data)
   result=NULL
   for (i in 1:n){
 print(i)
 y=as.numeric(data[i,-1])
 if (all((y-mean(y))==0))
   result=rbind(result,c(0,0,0,1))
 else {
   fit=glm.nb(y~group)
   result=rbind(result,summary(fit)$coef[2,])
 }
   }
   qval = qvalue(result[,4])
   fdr=0.1
   index=(qval$qvaluesfdr)
   dat.result = data[index,]
   write.table(dat.result,file=test_result,row.names=F,quote=F)
  
   If you use this input file and code, would reproduce the same error:
  
   Loading required package: methods
   Loading required package: survival
   Loading required package: splines
   Loading required package: mvtnorm
   Loading required package: modeltools
   Loading required package: stats4
  
   Attaching package: plyr
  
   The following object is masked from package:modeltools:
  
   empty
  
   [1] 1
   [1] 2
   [1] 3
   [1] 4
   [1] 5
   [1] 6
   Error in while ((it - it + 1)  limit  abs(del)  eps) { :
 missing value where TRUE/FALSE needed
   Calls: glm.nb - as.vector - theta.ml
   In addition: Warning messages:
   1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
   control$trace   :
 iteration limit reached
   2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
   control$trace   :
 iteration limit reached
   Execution halted
  
   So might be the error was in 6th line, not the line I pasted before
 (5th
   line)? Sorry about that.
  
   Thanks.
  
   Daofeng
  
  
   On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com
   wrote:
  
  
   On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote:
  
Dear R Community,
   
I have encountered a problem while using the R function glm.nb.
The code that produce the error was following two lines:
   
group=c(1,1,1,1,0,0,0,0)
fit=glm.nb(y~group)
   
While the y contains 8 sets of number like:
gene2750   1   0   0   1   5   1
0
   
Error message:

Re: [R] SQL queries in R

2013-06-07 Thread andrija djurovic

?sqlQuery
as.is - argument
Andrija
On Jun 7, 2013 9:10 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote:

 Hey all!

 I am trying to select a bunch of id's  (data type -character) from a table
 and store them in a variable in R
 But when i do this, it automatically truncates the leading zero's in id's
 even though they are of character type.

 code is :-

 myconn-odbcConnect(testdata)
 sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,'
 and COMMUNITY=',property,'),sep=)
  unit_ids-sqlQuery(myconn,sql.select)
 print(unit_ids)
 is there anyway i can retain the UNIT_ID's as they are.

 Thanks!
 Sneha

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Package 'ChoiceModelR' - Xbetas.csv? Rbetas.csv? betadraws?

2013-06-07 Thread Dimitri Liakhovitski

Hope to get some clarity from the creators of a (wonderful) package
ChoiceModelR. I am using the main function 'choicemodelr'.

Everything works.

?choicemodelr says average of MCMC draws of unit-level model coefficients
are written to Xbetas.csv
Question 1: Is this a typo? The real file created in the working directory
seems to be Rbetas.csv
Questions 2: Are these the averages of the 'use' draws (specified in my
choicemodelr statement)?
Question 3: If I had constraints, I assume these averages will be taking
into account constraints, right?

?choicemodelr also says that choicemodelr produces an object that contains,
among other things, two elemens:



betadraw

An ni by natt by floor(use/keep) array of MCMC random draws of unit-level
multinomial logit model parameter estimates.

betadraw.c

An ni by natt by floor(use/keep) array of constrained MCMC random draws of
unit-level multinomial logit model parameter estimates.



If I do:

apply(out$betadraw.c,c(1,2),mean)

then what is the difference between the result and the file written to Disk?

One difference I noticed:

apply(out$betadraw.c, c(1,2),mean) contains as many columns as parameters,
not as many columns as the number of levels.

What else?



Thank you very much!


-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Horizontal labels on a plot

2013-06-07 Thread David Winsemius


On Jun 7, 2013, at 8:38 AM, Shane Carey wrote:

 Hi,
 
 I have a plot with horizontal lables on the x-axis. For example:
 Devonian volcanic rocks
 n=2
 
 with n=2 on a new line. How do I centre n=2 under the  Devonian volcanic
 rocks  
 label?
 
 This was my code: text(axis_text, par(usr)[three], labels = paste(LABELS,
  \n ,  n =, t(t(name.count[,two]))), srt = txt_di,  adj = txtadj, xpd =
 TRUE, cex=txt_cex)
 

Try: 

 …, labels=bquote( atop(.(LABELS) , n==.(name.count[,two]) ) ), …

(Unable to test with all those undefined parameters.)

?plotmath
# where one can read that \n is not honored by plotmath functions, including 
paste().
?bquote

-- 
David

 where LABELS=Devonian volcanic rocks
 and name.count[,two] =2
 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rJava is not loading

2013-06-07 Thread Dimitri Liakhovitski

Thanks again, Brian.
Indeed, I had Java for 32 bits. Replaced it and rJava is working now.


On Thu, Jun 6, 2013 at 10:58 AM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote:

 On 06/06/2013 15:52, Dimitri Liakhovitski wrote:

 Thank you very much, Brian.
 It's clearly christal clear - but one needs a christal ball to realize
 that! :-)
 So I learned: architecture = bitness


 On current Windows, yes.
 But not on OS X nor Linux nor Solaris nor FreeBSD 

 It indicates the type of CPU, or more precisely the instruction set the
 CPU is running.  x86_64 CPUs (as used by most current Windows boxes) can
 emulate i386 CPUs.  So the architectures for R on Windows are

 i386 (32-bit)
 x64 (64-bit, the Windows name for x86_64).

  Dimitri


 On Thu, Jun 6, 2013 at 2:19 AM, Prof Brian Ripley rip...@stats.ox.ac.uk
 mailto:rip...@stats.ox.ac.uk** wrote:

 On 06/06/2013 00:38, Dimitri Liakhovitski wrote:

 Hello!
 I installed rJava and am trying to load it.

 library(rJava)

 Error : .onLoad failed in loadNamespace() for 'rJava', details:
 call: fun(libname, pkgname)
 error: No CurrentVersion entry in Software/JavaSoft
 registry! Try
 re-installing Java and make sure R and Java have matching
 architectures.
 Error: package or namespace load failed for rJava

 Any idea why?
 Background info:


 But the posting guide asked for the output from sessionInfo(): we
 want to know what R knows it is running as, not why you think.


 Windows 7, 64-bit
 R version 3.0.1 (for 64-bit)
 I just installed the lastest Java: Java 7 Update 21


 For the right architecture?

 The message is crystal clear: you do not have a Java installed
 matching your R.  Most likely your Java is 32-bit and your R 64-bit
 or v.v.



 --
 Brian D. Ripley, rip...@stats.ox.ac.uk mailto:rip...@stats.ox.ac.uk
 Professor of Applied Statistics,
 
 http://www.stats.ox.ac.uk/~__**ripley/http://www.stats.ox.ac.uk/~__ripley/

 http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/
 
 University of Oxford, Tel: +44 1865 272861
 tel:%2B44%201865%20272861 (self)
 1 South Parks Road, +44 1865 272866 tel:%2B44%201865%20272866 (PA)

 Oxford OX1 3TG, UKFax: +44 1865 272595
 tel:%2B44%201865%20272595




 --
 Dimitri Liakhovitski



 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  
 http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595




-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQL queries in R

2013-06-07 Thread Sneha Bishnoi

tried as.is ,gives an error,

[1] 01000 10054 [Microsoft][ODBC SQL Server
Driver][DBNETLIB]ConnectionWrite (send()).
[2] [RODBC] ERROR: Could not SQLExecDirect 'select UNIT_ID from UNITS
where (UNIT_TYPE='1X1' and COMMUNITY='SAN1193')'


On Fri, Jun 7, 2013 at 3:21 PM, andrija djurovic djandr...@gmail.comwrote:

 ?sqlQuery
 as.is - argument
 Andrija
 On Jun 7, 2013 9:10 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote:

 Hey all!

 I am trying to select a bunch of id's  (data type -character) from a table
 and store them in a variable in R
 But when i do this, it automatically truncates the leading zero's in id's
 even though they are of character type.

 code is :-

 myconn-odbcConnect(testdata)
 sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,'
 and COMMUNITY=',property,'),sep=)
  unit_ids-sqlQuery(myconn,sql.select)
 print(unit_ids)
 is there anyway i can retain the UNIT_ID's as they are.

 Thanks!
 Sneha

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Sneha Bishnoi
+14047235469
H. Milton Stewart School of Industrial   Systems Engineering
Georgia Tech

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQL queries in R

2013-06-07 Thread andrija djurovic

myconn-odbcConnect(testdata)
sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,'
and COMMUNITY=',property,'),sep=)

unit_ids-sqlQuery(myconn,sql.select,as.is=TRUE)

This should works if myconn and sql.select are defined properly

Andrija
On Jun 7, 2013 9:58 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote:

 tried as.is ,gives an error,

 [1] 01000 10054 [Microsoft][ODBC SQL Server
 Driver][DBNETLIB]ConnectionWrite (send()).
 [2] [RODBC] ERROR: Could not SQLExecDirect 'select UNIT_ID from UNITS
 where (UNIT_TYPE='1X1' and COMMUNITY='SAN1193')'


 On Fri, Jun 7, 2013 at 3:21 PM, andrija djurovic djandr...@gmail.comwrote:

 ?sqlQuery
 as.is - argument
 Andrija
 On Jun 7, 2013 9:10 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote:

 Hey all!

 I am trying to select a bunch of id's  (data type -character) from a
 table
 and store them in a variable in R
 But when i do this, it automatically truncates the leading zero's in id's
 even though they are of character type.

 code is :-

 myconn-odbcConnect(testdata)
 sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,'
 and COMMUNITY=',property,'),sep=)
  unit_ids-sqlQuery(myconn,sql.select)
 print(unit_ids)
 is there anyway i can retain the UNIT_ID's as they are.

 Thanks!
 Sneha

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Sneha Bishnoi
 +14047235469
 H. Milton Stewart School of Industrial   Systems Engineering
 Georgia Tech


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RCOde need urgent HELP

2013-06-07 Thread felice

hey i need help with a case in my R project.

i have a vector Rt with returns from day 1 till day t  now i want to form a
new vector r= which is rt = Xt − Xt−1

then i want to definde positive and negative aggregates

r(n)+ _t= 1/n (rt +...+rt−n) I {(rt+...+ rt−n) ≥0}
r(n)− _t= 1/n (rt +...+rt−n) I {(rt+...+ rt−n) 0}

thanks for the help 




--
View this message in context: 
http://r.789695.n4.nabble.com/RCOde-need-urgent-HELP-tp4668981.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gamm in mgcv random effect significance

2013-06-07 Thread William Shadish

Dear R-helpers,

I'd like to understand how to test the statistical significance of a 
random effect in gamm. I am using gamm because I want to test a model 
with an AR(1) error structure, and it is my understanding neither gam 
nor gamm4 will do the latter.

The data set includes nine short interrupted time series (single case 
designs in education, sometimes called N-of-1 trials in medicine) from 
one study. They report a proportion as outcome (y), computed from a 
behavior either observed or not out of 10 trials per time point. Hence I 
use binomial (I believe quasi-binomial is not available in gamm). Each 
of the nine series has an average of 30 observations give or take (total 
264 observations), some under treatment (z) and some not. xc is centered 
session number, int is the z*xc interaction. Based on prior work, xc is 
also smoothed

Consider, for example, two models, both with AR(1) but one allowing a 
random effect on xc:

g1 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, 
correlation=corAR1())
g2 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, random = 
list(xc=~1),correlation=corAR1())

I include the output for g1 and g2 below, but the question is how to 
test the significance of the random effect on xc. I considered a test 
comparing the Log-Likelihoods, but have no idea what the degrees of 
freedom would be given that s(xc) is smoothed. I also tried:

anova(g1$gam, g2$gam)

that did not seem to return anything useful for this question.

A related question is how to test the significance of adding a second 
random effect to a model that already has a random effect, such as:

g3 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = 
list(Case=~1, z=~1),correlation=corAR1())
g4 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = 
list(Case=~1, z=~1, int=~1),correlation=corAR1())

Any help would be appreciated.

Thanks.

Will Shadish

g1
$lme
Linear mixed-effects model fit by maximum likelihood
   Data: data
   Log-likelihood: -437.696
   Fixed: fixed
X(Intercept)   Xz XintXs(xc)Fx1
0.6738466   -2.56883170.0137415   -0.1801294

Random effects:
  Formula: ~Xr - 1 | g
  Structure: pdIdnot
  Xr1  Xr2  Xr3 Xr4  
Xr5  Xr6  Xr7  Xr8 Residual
StdDev: 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 
0.0004377781 0.0004377781 0.0004377781 1.693177

Correlation Structure: AR(1)
  Formula: ~1 | g
  Parameter estimate(s):
   Phi
0.3110725
Variance function:
  Structure: fixed weights
  Formula: ~invwt
Number of Observations: 264
Number of Groups: 1

$gam

Family: binomial
Link function: logit

Formula:
y ~ s(xc) + z + int

Estimated degrees of freedom:
1  total = 4

attr(,class)
[1] gamm list

  g2
$lme
Linear mixed-effects model fit by maximum likelihood
   Data: data
   Log-likelihood: -443.9495
   Fixed: fixed
X(Intercept)   Xz XintXs(xc)Fx1
  0.720018143 -2.562155820  0.003457463 -0.045821030

Random effects:
  Formula: ~Xr - 1 | g
  Structure: pdIdnot
  Xr1  Xr2  Xr3 Xr4  
Xr5  Xr6  Xr7  Xr8
StdDev: 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 
7.056078e-06 7.056078e-06 7.056078e-06

  Formula: ~1 | xc %in% g
  (Intercept) Residual
StdDev: 6.277279e-05 1.683007

Correlation Structure: AR(1)
  Formula: ~1 | g/xc
  Parameter estimate(s):
   Phi
0.1809409
Variance function:
  Structure: fixed weights
  Formula: ~invwt
Number of Observations: 264
Number of Groups:
 g xc %in% g
 134

$gam

Family: binomial
Link function: logit

Formula:
y ~ s(xc) + z + int

Estimated degrees of freedom:
1  total = 4

attr(,class)
[1] gamm list


-- 
William R. Shadish
Distinguished Professor
Founding Faculty

Mailing Address:
William R. Shadish
University of California
School of Social Sciences, Humanities and Arts
5200 North Lake Rd
Merced CA  95343

Physical/Delivery Address:
University of California Merced
ATTN: William Shadish
School of Social Sciences, Humanities and Arts
Facilities Services Building A
5200 North Lake Rd.
Merced, CA 95343

209-228-4372 voice
209-228-4007 fax (communal fax: be sure to include cover sheet)
wshad...@ucmerced.edu
http://faculty.ucmerced.edu/wshadish/index.htm
http://psychology.ucmerced.edu



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matched samples, dataframe, panel data

Thank you very much.

I apologize but I want to ask only one more thing:

How to do if I want in addition to impose that the diffence between 
dimensions doesn't exced an absolute value, for example, 200.

Firm 1 continues matching with firm 5 in year 2000 because 
2189-2120=69, and 69 is  10%x2120=212, and 69 is also  200

But firm 20 doesn't match with firm 17 in year 2001
because 6754-6438=316, and 316 is  10%x6438=643.8, but 316 is  200


Thank you

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 17:30
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Sorry, it included a case with both values for dummy were 1.

lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) 
y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]})

 res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x)))
 row.names(res)-1:nrow(res)
nrow(subset(res,dummy==0))
#[1] 22
 nrow(subset(res,dummy==1))
#[1] 22
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 12:03 PM
Subject: RE: [R] matched samples, dataframe, panel data

Sorry,

Something is not ok, because when  I do

 nrow(subset(res,res$dummy==0))
[1] 22
 nrow(subset(res,res$dummy==1))
[1] 24

There the number  o observatios in the two subsets is not equal.


Thanks again,

Cecília



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#11 2000   20 0  2120
#25 2000   20 1  2189
#34 2000   20 0  3178
#47 2000   20 1  3245


#or
res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
row.names(res)-1:nrow(res)
res[1:4,]
#  firm year industry dummy dimension   group
#11 2000   20 0  2120 2000.20   #1 group
#25 2000   20 1  2189 2000.20   #1
#34 2000   20 0  3178 2000.20   #2
#47 2000   20 1  3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1 1 2000   20 0  2120
#215 2000   20 1  2189
#
#$`2000.20`[[2]]
#  firm year industry dummy dimension
#164 2000   20 0  3178
#317 2000   20 1  3245
#
#$`2000.20`[[3]]
#  firm year industry dummy dimension
#113 2000   20 1  4532
#6 2 2000   20 0  4890
A.K.






From: Cecilia Carmo cecilia.ca...@ua.pt
To: r-help@r-project.org r-help@r-project.org

Sent: Friday, June 7, 2013 9:56 AM
Subject: Re: [R] matched samples, dataframe, panel data




Again my problem, better explained.

#I have a data panel of thousands of firms, by year and industry and
#one dummy variable that identifies one kind of firms (1 if the firm have an 
auditor; 0 if not)
#and another variable the represents the firm dimension (total assets in 
thousand of euros)
#I need to create two separated samples with the same number os firms where
#one firm in the first have a corresponding firm in the second with the same
#year, industry and dimension (the dimension doesn't need to be exatly the
#same, it could vary in an interval of +/- 10%, for example)

#My

Re: [R] gamm in mgcv random effect significance

2013-06-07 Thread Gavin Simpson

On Fri, 2013-06-07 at 13:12 -0700, William Shadish wrote:
 Dear R-helpers,
 
 I'd like to understand how to test the statistical significance of a 
 random effect in gamm. I am using gamm because I want to test a model 
 with an AR(1) error structure, and it is my understanding neither gam 
 nor gamm4 will do the latter.

gamm4() can't yes and out of the box mgcv::gam can't either but
see ?magic for an example of correlated errors and how the fits can be
manipulated to take the AR(1) (or any structure really as far as I can
tell) into account.

You might like to look at mgcv::bam() which allows an known AR(1) term
but do check that it does what you think; with a random effect spline
I'm not at all certain that it will nest the AR(1) in the random effect
level.

snip /
 Consider, for example, two models, both with AR(1) but one allowing a 
 random effect on xc:
 
 g1 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, 
 correlation=corAR1())
 g2 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, random = 
 list(xc=~1),correlation=corAR1())

Shouldn't you specify how the AR(1) is nested in the hierarchy here,
i.e. AR(1) within xc? maybe I'm not following your data structure
correctly.

 I include the output for g1 and g2 below, but the question is how to 
 test the significance of the random effect on xc. I considered a test 
 comparing the Log-Likelihoods, but have no idea what the degrees of 
 freedom would be given that s(xc) is smoothed. I also tried:
 
 anova(g1$gam, g2$gam)

gamm() fits via the lme() function of package nlme. To do what you want,
you need the anova() method for objects of class lme, e.g.

anova(g1$lme, g2$lme)

Then I think you should check if the fits were done via REML and also be
aware of the issue of testing wether a variance term is 0.

 that did not seem to return anything useful for this question.
 
 A related question is how to test the significance of adding a second 
 random effect to a model that already has a random effect, such as:
 
 g3 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = 
 list(Case=~1, z=~1),correlation=corAR1())
 g4 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = 
 list(Case=~1, z=~1, int=~1),correlation=corAR1())

Again, I think you need anova() on the $lme components.

HTH

G

 Any help would be appreciated.
 
 Thanks.
 
 Will Shadish
 
 g1
 $lme
 Linear mixed-effects model fit by maximum likelihood
Data: data
Log-likelihood: -437.696
Fixed: fixed
 X(Intercept)   Xz XintXs(xc)Fx1
 0.6738466   -2.56883170.0137415   -0.1801294
 
 Random effects:
   Formula: ~Xr - 1 | g
   Structure: pdIdnot
   Xr1  Xr2  Xr3 Xr4  
 Xr5  Xr6  Xr7  Xr8 Residual
 StdDev: 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 
 0.0004377781 0.0004377781 0.0004377781 1.693177
 
 Correlation Structure: AR(1)
   Formula: ~1 | g
   Parameter estimate(s):
Phi
 0.3110725
 Variance function:
   Structure: fixed weights
   Formula: ~invwt
 Number of Observations: 264
 Number of Groups: 1
 
 $gam
 
 Family: binomial
 Link function: logit
 
 Formula:
 y ~ s(xc) + z + int
 
 Estimated degrees of freedom:
 1  total = 4
 
 attr(,class)
 [1] gamm list
 
   g2
 $lme
 Linear mixed-effects model fit by maximum likelihood
Data: data
Log-likelihood: -443.9495
Fixed: fixed
 X(Intercept)   Xz XintXs(xc)Fx1
   0.720018143 -2.562155820  0.003457463 -0.045821030
 
 Random effects:
   Formula: ~Xr - 1 | g
   Structure: pdIdnot
   Xr1  Xr2  Xr3 Xr4  
 Xr5  Xr6  Xr7  Xr8
 StdDev: 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 
 7.056078e-06 7.056078e-06 7.056078e-06
 
   Formula: ~1 | xc %in% g
   (Intercept) Residual
 StdDev: 6.277279e-05 1.683007
 
 Correlation Structure: AR(1)
   Formula: ~1 | g/xc
   Parameter estimate(s):
Phi
 0.1809409
 Variance function:
   Structure: fixed weights
   Formula: ~invwt
 Number of Observations: 264
 Number of Groups:
  g xc %in% g
  134
 
 $gam
 
 Family: binomial
 Link function: logit
 
 Formula:
 y ~ s(xc) + z + int
 
 Estimated degrees of freedom:
 1  total = 4
 
 attr(,class)
 [1] gamm list
 
 

-- 
Gavin Simpson, PhD  [t] +1 306 337 8863
Adjunct Professor, Department of Biology[f] +1 306 337 2410
Institute of Environmental Change  Society [e] gavin.simp...@uregina.ca
523 Research and Innovation Centre  [tw] @ucfagls
University of Regina
Regina, SK S4S 0A2, Canada

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matched samples, dataframe, panel data

HI,
Do you want to impose this on 'res' on 'final3'?
If it is on `res`:
res1-do.call(rbind,lapply(split(res,(1:nrow(res)-1)%/%2+1),function(x) 
x[abs(diff(x$dimension))200,]))
row.names(res1)-1:nrow(res1)
nrow(subset(res1,dummy==0))
#[1] 14
 nrow(subset(res1,dummy==1))
#[1] 14

 head(res1)
#  firm year industry dummy dimension
#1    1 2000   20 0  2120
#2    5 2000   20 1  2189
#3    4 2000   20 0  3178
#4    7 2000   20 1  3245
#5    4 2001   20 0  2678
#6    2 2001   20 1  2789
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: R help r-help@r-project.org
Cc: arun smartpink...@yahoo.com
Sent: Friday, June 7, 2013 5:51 PM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.

I apologize but I want to ask only one more thing:

How to do if I want in addition to impose that the diffence between 
dimensions doesn't exced an absolute value, for example, 200.

Firm 1 continues matching with firm 5 in year 2000 because 
2189-2120=69, and 69 is  10%x2120=212, and 69 is also  200

But firm 20 doesn't match with firm 17 in year 2001
because 6754-6438=316, and 316 is  10%x6438=643.8, but 316 is  200


Thank you

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 17:30
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Sorry, it included a case with both values for dummy were 1.

lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) 
y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]})

res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)
nrow(subset(res,dummy==0))
#[1] 22
nrow(subset(res,dummy==1))
#[1] 22
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 12:03 PM
Subject: RE: [R] matched samples, dataframe, panel data

Sorry,

Something is not ok, because when  I do

 nrow(subset(res,res$dummy==0))
[1] 22
 nrow(subset(res,res$dummy==1))
[1] 24

There the number  o observatios in the two subsets is not equal.


Thanks again,

Cecília



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#1    1 2000       20     0      2120
#2    5 2000       20     1      2189
#3    4 2000       20     0      3178
#4    7 2000       20     1      3245


#or
res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
row.names(res)-1:nrow(res)
res[1:4,]
#  firm year industry dummy dimension   group
#1    1 2000       20     0      2120 2000.20   #1 group
#2    5 2000       20     1      2189 2000.20   #1
#3    4 2000       20     0      3178 2000.20   #2
#4    7 2000       20     1      3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1     1 2000       20     0      2120
#21    5 2000       20     1      2189
#
#$`2000.20`[[2]]
#  firm year industry dummy dimension
#16    4 2000       20     0      3178
#31    7 2000       20     1      3245
#
#$`2000.20`[[3]]
#  firm year industry dummy dimension
#11    3 2000       20     1      4532
#6     2 2000       20     0      4890
A.K.

Re: [R] matched samples, dataframe, panel data

Maybe the final result is the same, but I want to impose it on final3, when we 
are matching the firms.

Thanks,
Cecília




De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 23:04
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

HI,
Do you want to impose this on 'res' on 'final3'?
If it is on `res`:
res1-do.call(rbind,lapply(split(res,(1:nrow(res)-1)%/%2+1),function(x) 
x[abs(diff(x$dimension))200,]))
row.names(res1)-1:nrow(res1)
nrow(subset(res1,dummy==0))
#[1] 14
 nrow(subset(res1,dummy==1))
#[1] 14

 head(res1)
#  firm year industry dummy dimension
#11 2000   20 0  2120
#25 2000   20 1  2189
#34 2000   20 0  3178
#47 2000   20 1  3245
#54 2001   20 0  2678
#62 2001   20 1  2789
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: R help r-help@r-project.org
Cc: arun smartpink...@yahoo.com
Sent: Friday, June 7, 2013 5:51 PM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.

I apologize but I want to ask only one more thing:

How to do if I want in addition to impose that the diffence between
dimensions doesn't exced an absolute value, for example, 200.

Firm 1 continues matching with firm 5 in year 2000 because
2189-2120=69, and 69 is  10%x2120=212, and 69 is also  200

But firm 20 doesn't match with firm 17 in year 2001
because 6754-6438=316, and 316 is  10%x6438=643.8, but 316 is  200


Thank you

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 17:30
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Sorry, it included a case with both values for dummy were 1.

lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) 
y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]})

res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)
nrow(subset(res,dummy==0))
#[1] 22
nrow(subset(res,dummy==1))
#[1] 22
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 12:03 PM
Subject: RE: [R] matched samples, dataframe, panel data

Sorry,

Something is not ok, because when  I do

 nrow(subset(res,res$dummy==0))
[1] 22
 nrow(subset(res,res$dummy==1))
[1] 24

There the number  o observatios in the two subsets is not equal.


Thanks again,

Cecília



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#11 2000   20 0  2120
#25 2000   20 1  2189
#34 2000   20 0  3178
#47 2000   20 1  3245


#or
res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
row.names(res)-1:nrow(res)
res[1:4,]
#  firm year industry dummy dimension   group
#11 2000   20 0  2120 2000.20   #1 group
#25 2000   20 1  2189 2000.20   #1
#34 2000   20 0  3178 2000.20   #2
#47 2000   20 1  3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try other ways.



- Original Message -
From: arun smartpink...@yahoo.com
To: Cecilia Carmo cecilia.ca...@ua.pt
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 11:25 AM
Subject: Re: [R] matched samples, dataframe, panel data

Hi,
May be this helps:
lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]))
lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]

names(lst6)
# [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30
#[8] 2001.40 2002.40 2003.40 2004.40


lst6[2000.20]
#$`2000.20`
#$`2000.20`[[1]]
#  firm year industry dummy dimension
#1 1 2000   20 0  2120
#215

Re: [R] matched samples, dataframe, panel data

Hi,
If i replace `lst3` with:

lst1-split(final3,list(final3$year,final3$industry))
lst2-lst1[lapply(lst1,nrow)0]
lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) {x1-x[(y 
(x$dimension+x$dimension*0.1))  (y (x$dimension-x$dimension*0.1)),]; 
x1[abs(diff(x1$dimension))200,]})) ###changed

lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2])
lst5-lapply(lst4,function(x)x[!duplicated(x)])
lst6-lst5[lapply(lst5,length)0]
lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) 
y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]})
res2-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x)))
 identical(res1,res2)
#[1] TRUE
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 6:10 PM
Subject: RE: [R] matched samples, dataframe, panel data

Maybe the final result is the same, but I want to impose it on final3, when we 
are matching the firms.

Thanks,
Cecília




De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 23:04
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

HI,
Do you want to impose this on 'res' on 'final3'?
If it is on `res`:
res1-do.call(rbind,lapply(split(res,(1:nrow(res)-1)%/%2+1),function(x) 
x[abs(diff(x$dimension))200,]))
row.names(res1)-1:nrow(res1)
nrow(subset(res1,dummy==0))
#[1] 14
nrow(subset(res1,dummy==1))
#[1] 14

head(res1)
#  firm year industry dummy dimension
#1    1 2000       20     0      2120
#2    5 2000       20     1      2189
#3    4 2000       20     0      3178
#4    7 2000       20     1      3245
#5    4 2001       20     0      2678
#6    2 2001       20     1      2789
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: R help r-help@r-project.org
Cc: arun smartpink...@yahoo.com
Sent: Friday, June 7, 2013 5:51 PM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.

I apologize but I want to ask only one more thing:

How to do if I want in addition to impose that the diffence between
dimensions doesn't exced an absolute value, for example, 200.

Firm 1 continues matching with firm 5 in year 2000 because
2189-2120=69, and 69 is  10%x2120=212, and 69 is also  200

But firm 20 doesn't match with firm 17 in year 2001
because 6754-6438=316, and 316 is  10%x6438=643.8, but 316 is  200


Thank you

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 17:30
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Sorry, it included a case with both values for dummy were 1.

lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) 
y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]})

res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)
nrow(subset(res,dummy==0))
#[1] 22
nrow(subset(res,dummy==1))
#[1] 22
A.K.



- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc: R help r-help@r-project.org
Sent: Friday, June 7, 2013 12:03 PM
Subject: RE: [R] matched samples, dataframe, panel data

Sorry,

Something is not ok, because when  I do

 nrow(subset(res,res$dummy==0))
[1] 22
 nrow(subset(res,res$dummy==1))
[1] 24

There the number  o observatios in the two subsets is not equal.


Thanks again,

Cecília



De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:43
Para: Cecilia Carmo
Cc: R help
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
Not sure if this is what you wanted.

res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
row.names(res)-1:nrow(res)

# this combines the list of lists to a data.frame
res[1:4,]
#  firm year industry dummy dimension
#1    1 2000       20     0      2120
#2    5 2000       20     1      2189
#3    4 2000       20     0      3178
#4    7 2000       20     1      3245


#or
res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x)))
res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res))
row.names(res)-1:nrow(res)
res[1:4,]
#  firm year industry dummy dimension   group
#1    1 2000       20     0      2120 2000.20   #1 group
#2    5 2000       20     1      2189 2000.20   #1
#3    4 2000       20     0      3178 2000.20   #2
#4    7 2000       20     1      3245 2000.20   #2
A.K.


- Original Message -
From: Cecilia Carmo cecilia.ca...@ua.pt
To: arun smartpink...@yahoo.com
Cc:
Sent: Friday, June 7, 2013 11:33 AM
Subject: RE: [R] matched samples, dataframe, panel data

Thank you very much.
Just a little thing: how can I put it like a dataframe?

Thanks,

Cecília


De: arun [smartpink...@yahoo.com]
Enviado: sexta-feira, 7 de Junho de 2013 16:27
Para: Cecilia Carmo
Assunto: Re: [R] matched samples, dataframe, panel data

Hi,
There could be easier ways...  I am a bit busy now to try

[R] It seams that fast99 function (sensitivity package) does not work out for norm distribution.

2013-06-07 Thread Marino David

Dear all mailing listers,

Does Anyone have the same problem as mine when using the fast99
(extended-FAST method) to perform SA of model with norm distribution inputs?

See the simple example given following.

Any suggestion will be greatly appreciated.

Thank you!

Marino


# Simple example

# 1. uniform version (It works well)
library(sensitivity)
Myfun-function(x){return(rowSums(x))}
SA1 - fast99(model = Myfun, factors = 3, n = 1000, q = qunif, q.arg =
list(min = 0, max = 1))


 SA1
Call:
fast99(model = Myfun, factors = 3, n = 1000, q = qunif, q.arg =
list(min = 0, max = 1))

Model runs: 3000

Estimations of the indices:
   first order total order
X1   0.3335243   0.3350584
X2   0.3335243   0.3350584
X3   0.3335243   0.3350584






# 2. norm version (it does not work)
SA2 - fast99(model = Myfun, factors = 3, n = 1000, q = qnorm, q.arg =
list(mean = 0, sd = 1))


 SA2
Call:
fast99(model = Myfun, factors = 3, n = 1000, q = qnorm, q.arg =
list(mean = 0, sd = 1))

Model runs: 3000

Estimations of the indices:
   first order total order
X1  NA  NA
X2  NA  NA
X3  NA  NA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to install NADA in R 3.0.0

2013-06-07 Thread Janh Anni

Try contacting Dr Dennis Helsel, the developer at dhel...@practicalstats.com


On Fri, Jun 7, 2013 at 1:47 PM, David Doyle kydaviddo...@gmail.com wrote:

 Hello folks,

 Im trying to install the NADA package in R 3.0.0


 It has been archived so I tried to downloading it and installing it
 locally.  I get

  utils:::menuInstallLocal()

 package NADA successfully unpacked and MD5 sums checked


 But then when I load it I get

  library(NADA)

 Error in library(NADA) : NADA is not a valid installed package

 

 Any sugestions

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPlus script

2013-06-07 Thread Rolf Turner

On 07/06/13 23:05, Duncan Murdoch wrote:
 On 13-06-06 6:22 PM, Rolf Turner wrote:
 On 07/06/13 03:19, Scott Raynaud wrote:
 I actually had tried placing arguments in the call but it didn't 
 work.   However, I did
 not think about writing it to a variable and printing.  That seems 
 to have done the
 trick.  Funny, I don't remember having to do that before, but that's 
 not surprising.


 If I remember correctly --- haven't used Splus for decades --- this is a
 difference
 between Splus and R.

 In R the output of a function is returned *invisibly* if that function
 is called
 from within another function.  And source() is one such other function.

 Actually this depends on the caller.  source() does return its results 
 invisibly, but many other functions don't.

 From FAQ 7.16:
 If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the 
 command line the returned value is automatically printed (unless it is 
 |invisible()|), but in other circumstances, such as in a |source()|d 
 file or ***inside a function*** it isn't printed unless you 
 specifically print it. 

(Emphasis added.)

I think that you have misinterpreted what I wrote.  Many (most?) functions
*return* their results (values) visibly.  But if you put an expression 
into the code
of that function (an expression which is not part of the returned value) 
you never
see the result of evaluating that expression.

E.g.:

 foo - function(x) {
 sin(42)
 x^2
 }

 foo(3)
 [1] 9

The value of sin(42) is never seen.

The main point however is that IIRC Splus is different from R in respect
of whether the values of (un-assigned) expressions inside source are
visible.  In R they are invisible; in Splus I *believe* (vaguely recall) 
that
they are visible.  I cannot check this since I have no access to Splus.

I do wish someone would confirm (or deny, as the case may be) that
my recollections about Splus are correct.

 cheers,

 Rolf



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trying to install NADA in R 3.0.0

2013-06-07 Thread Rich Shepard


On Fri, 7 Jun 2013, Janh Anni wrote:


Try contacting Dr Dennis Helsel, the developer at
dhel...@practicalstats.com


  I don't know what OS David's using, nor if R-3.0.0 is an upgrade or a new
installation, but I had no issues when upgrading R from the last 2.x
version. The existing NADA in the system library works with 3.x. I'm running
Slackware-13.1 and -14.0.

  Iniially I ran install.packages(NADA), selected the closest repository,
and had a working package available for use.

Rich

--
Richard B. Shepard, Ph.D.  |  Have knowledge, will travel.
Applied Ecosystem Services, Inc.   |
http://www.appl-ecosys.com Voice: 503-667-4517  Fax: 503-667-8863

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPlus script

2013-06-07 Thread William Dunlap

  foo - function(x) {
  sin(42)
  x^2
  }
 
  foo(3)
  [1] 9
 
 The value of sin(42) is never seen.

This is true in both R and S+.  Only the return value of a function is available
for autoprinting and sin(42) is not the return value.  Perhaps you are 
remembering
the case where the last subexpression in the function is an assignment, in which
case S+ autoprints its value but R does not.

  R f - function(x) { y - x + 1 }
  R f(10)
  R .Last.value
  [1] 11

  S+ f - function(x) { y - x + 1 }
  Warning messages:
Last expression in function is an assignment
  (You probably wanted to return the left-hand side)
   in: y - x + 1
  S+ f(10)
  [1] 11
  S+ .Last.value
  [1] 11

Autoprinting is implemented quite differently in R and S+.  I think R attaches
the autoprint flag to the returned object while S+ assigns .Auto.print-TRUE
in the frame of the caller (yuck).  That leads to differences like the following

  R f - function(x) invisible(x+1)
  R g - function(x) 10 * x
  R g(f(23)) # g() and f() are evaluated in same frame
  [1] 240
  R .Last.value
  [1] 240

  S+ f - function(x) invisible(x+1)
  S+ g - function(x) 10 * x
  S+ g(f(23)) # g() and f() are evaluated in same frame
  S+ .Last.value
  [1] 240

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Rolf Turner
 Sent: Friday, June 07, 2013 4:17 PM
 To: Duncan Murdoch
 Cc: r-help@r-project.org; Scott Raynaud
 Subject: Re: [R] SPlus script
 
 On 07/06/13 23:05, Duncan Murdoch wrote:
  On 13-06-06 6:22 PM, Rolf Turner wrote:
  On 07/06/13 03:19, Scott Raynaud wrote:
  I actually had tried placing arguments in the call but it didn't
  work.   However, I did
  not think about writing it to a variable and printing.  That seems
  to have done the
  trick.  Funny, I don't remember having to do that before, but that's
  not surprising.
 
 
  If I remember correctly --- haven't used Splus for decades --- this is a
  difference
  between Splus and R.
 
  In R the output of a function is returned *invisibly* if that function
  is called
  from within another function.  And source() is one such other function.
 
  Actually this depends on the caller.  source() does return its results
  invisibly, but many other functions don't.
 
  From FAQ 7.16:
  If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the
  command line the returned value is automatically printed (unless it is
  |invisible()|), but in other circumstances, such as in a |source()|d
  file or ***inside a function*** it isn't printed unless you
  specifically print it.
 
 (Emphasis added.)
 
 I think that you have misinterpreted what I wrote.  Many (most?) functions
 *return* their results (values) visibly.  But if you put an expression
 into the code
 of that function (an expression which is not part of the returned value)
 you never
 see the result of evaluating that expression.
 
 E.g.:
 
  foo - function(x) {
  sin(42)
  x^2
  }
 
  foo(3)
  [1] 9
 
 The value of sin(42) is never seen.
 
 The main point however is that IIRC Splus is different from R in respect
 of whether the values of (un-assigned) expressions inside source are
 visible.  In R they are invisible; in Splus I *believe* (vaguely recall)
 that
 they are visible.  I cannot check this since I have no access to Splus.
 
 I do wish someone would confirm (or deny, as the case may be) that
 my recollections about Splus are correct.
 
  cheers,
 
  Rolf
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPlus script

2013-06-07 Thread Rolf Turner



Very interesting.  But what explicitly happens with ***source()***
in Splus???

cheers,

Rolf

On 08/06/13 11:33, William Dunlap wrote:

  foo - function(x) {
  sin(42)
  x^2
  }

  foo(3)
  [1] 9

The value of sin(42) is never seen.

This is true in both R and S+.  Only the return value of a function is available
for autoprinting and sin(42) is not the return value.  Perhaps you are 
remembering
the case where the last subexpression in the function is an assignment, in which
case S+ autoprints its value but R does not.

   R f - function(x) { y - x + 1 }
   R f(10)
   R .Last.value
   [1] 11

   S+ f - function(x) { y - x + 1 }
   Warning messages:
 Last expression in function is an assignment
   (You probably wanted to return the left-hand side)
in: y - x + 1
   S+ f(10)
   [1] 11
   S+ .Last.value
   [1] 11

Autoprinting is implemented quite differently in R and S+.  I think R attaches
the autoprint flag to the returned object while S+ assigns .Auto.print-TRUE
in the frame of the caller (yuck).  That leads to differences like the following

   R f - function(x) invisible(x+1)
   R g - function(x) 10 * x
   R g(f(23)) # g() and f() are evaluated in same frame
   [1] 240
   R .Last.value
   [1] 240

   S+ f - function(x) invisible(x+1)
   S+ g - function(x) 10 * x
   S+ g(f(23)) # g() and f() are evaluated in same frame
   S+ .Last.value
   [1] 240

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Rolf Turner
Sent: Friday, June 07, 2013 4:17 PM
To: Duncan Murdoch
Cc: r-help@r-project.org; Scott Raynaud
Subject: Re: [R] SPlus script

On 07/06/13 23:05, Duncan Murdoch wrote:

On 13-06-06 6:22 PM, Rolf Turner wrote:

On 07/06/13 03:19, Scott Raynaud wrote:

I actually had tried placing arguments in the call but it didn't
work.   However, I did
not think about writing it to a variable and printing.  That seems
to have done the
trick.  Funny, I don't remember having to do that before, but that's
not surprising.


If I remember correctly --- haven't used Splus for decades --- this is a
difference
between Splus and R.

In R the output of a function is returned *invisibly* if that function
is called
from within another function.  And source() is one such other function.

Actually this depends on the caller.  source() does return its results
invisibly, but many other functions don't.

  From FAQ 7.16:

If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the
command line the returned value is automatically printed (unless it is
|invisible()|), but in other circumstances, such as in a |source()|d
file or ***inside a function*** it isn't printed unless you
specifically print it.

(Emphasis added.)

I think that you have misinterpreted what I wrote.  Many (most?) functions
*return* their results (values) visibly.  But if you put an expression
into the code
of that function (an expression which is not part of the returned value)
you never
see the result of evaluating that expression.

E.g.:

  foo - function(x) {
  sin(42)
  x^2
  }

  foo(3)
  [1] 9

The value of sin(42) is never seen.

The main point however is that IIRC Splus is different from R in respect
of whether the values of (un-assigned) expressions inside source are
visible.  In R they are invisible; in Splus I *believe* (vaguely recall)
that
they are visible.  I cannot check this since I have no access to Splus.

I do wish someone would confirm (or deny, as the case may be) that
my recollections about Splus are correct.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] estimation of covariance matrix of a bivariate normal distribution using maximization of the log-likelihood

2013-06-07 Thread Rolf Turner



See in line below.

On 08/06/13 00:00, Hertzog Gladys wrote:

Dear all,
I’m new in R and I’m trying to estimate the covariance matrix of a bivariate 
normal distribution by maximizing the log-likelihood. The maximization really 
has to be performed with the non-linear minimization routine (nlm).

Why?

The 2 means of the distribution are known and equal to 1.

But you simulate your data using means 2.4 and 1.3.

I’ve already tried 2 different ways to compute this covariance but for each of 
them I obtained a lot of warnings and illogical values for the covariance 
matrix.
  
In the first one, I defined the bivariate normal distribution with the command dmvnorm:
  
x-rnorm(6000, 2.4, 0.6)

x - matrix(c(x), ncol=1)

Why do you use matrix() and c() here? Totally unnecessary
(and confusing).

y-rlnorm(6000, 1.3,0.1)
y - matrix(c(y), ncol=1)
XY - cbind(x,y)

To estimate the covariance matrix underlying the data XY you could
simply do:

M - var(XY).

If you want to impose the constraint that the true mean vector is c(1,1)
you could do:

mu - c(1,1)
xbar - apply(XY,2,mean)
M - (5999/6000)*var(XY) + (xbar-mu)%*%t(xbar-mu)

It would of course make more sense in terms of your simulated data
to use:

mu - c(2.4,1.3)

I haven't looked at your code below any further. Too much of a mess.

cheers,

Rolf Turner
  
L - function(par,x,y) {

return (-sum(log(par[4]*dmvnorm(XY, mean=c(1,1), sigma= matrix(c(par[1], 
par[1]*par[2]*par[3],par[1]*par[2]*par[3], par[2] ),nrow=2, ncol=2))
)))
}
par.start- c(0.5, 0.5 ,0.5 ,0.5)
result-nlm(L,par.start,y=y,x=x, hessian=TRUE)
par.hat - result$estimate
  
par.hatIl y a eu 32 avis (utilisez warnings() pour les visionner)

par.hat - result$estimate
par.hat

[1] 5.149919e+01 2.520721e+02 8.734212e-03 3.996771e+02

warnings()

Messages d'avis :
1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
2: In nlm(L, par.start, y = y, x = x, hessian = TRUE) :
   NA/Inf replaced by maximum positive value
3: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
4: In nlm(L, par.start, y = y, x = x, hessian = TRUE) :
   NA/Inf replaced by maximum positive value
5: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
6: In nlm(L, par.start, y = y, x = x, hessian = TRUE) :
   NA/Inf replaced by maximum positive value
7: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
8: In nlm(L, par.start, y = y, x = x, hessian = TRUE) :
   NA/Inf replaced by maximum positive value
9: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
10: In nlm(L, par.start, y = y, x = x, hessian = TRUE) :
   NA/Inf replaced by maximum positive value
11: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
12: In nlm(L, par.start, y = y, x = x, hessian = TRUE) :
   NA/Inf replaced by maximum positive value
13: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
   production de NaN
  
In the second one, I wrote step by step the bivariate normal distribution in order to have each parameter separately (not in a matrix) but it didn’t work as well:
  
x-rnorm(6000, 2.4, 0.6)

y-rlnorm(6000, 1.3,0.1)
L - function(par,x,y) {
return (-sum(log((1-par[4])*( (1/(2*pi*par[1]*par[2]*sqrt(1-par[3])))*exp(   
(-1/2*(1-par[3]^2))* ((y-1)/par[2])^2 +((x-1)/par[1])^2 - 
2*(y-1)*(x-1)/(par[2]*par[1])   )) )))
}
#par [1]= sigma_x , par [2]= sigma_y  par [3]= rho_xy  par[4] is a mixing 
parameter. The final step of my calculation will be to have a mixture of 
bivariate normal and log-normal distributions.
par.start- c(0.5, 0.5 ,0.5 ,0.5)
result-nlm(L,par.start,y=y,x=x, hessian=T)
par.hat - result$estimate
par.ha
  
When I run this script, I get always 50 advices like those below:

Messages d'avis :
1: In sqrt(1 - par[3]) : production de NaN
2: In nlm(L, par.start, y = y, x = x, hessian = T) :
   NA/Inf replaced by maximum positive value
3: In sqrt(1 - par[3]) : production de NaN
4: In nlm(L, par.start, y = y, x = x, hessian = T) :
   NA/Inf replaced by maximum positive value
5: In sqrt(1 - par[3]) : production de NaN
6: In nlm(L, par.start, y = y, x = x, hessian = T) :
   NA/Inf replaced by maximum positive value
7: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] *  ... : production de 
NaN
8: In nlm(L, par.start, y = y, x = x, hessian = T) :
   NA/Inf replaced by maximum positive value
9: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] *  ... : production de 
NaN
10: In nlm(L, par.start, y = y, x = x, hessian = T) :
   NA/Inf replaced by maximum positive value
11: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] *  ... : production de 
NaN
12: In nlm(L, par.start, y = y, x = x, hessian = T) :
   NA/Inf replaced by maximum positive value
……
Does one of you know how to use the nlm method to estimate the covariance 
matrix (and mixing parameter ) of a bivariate normal distribution?
  
Thank you

[R] optim() over bogus parameters

2013-06-07 Thread Ryan Dew

Hello,

I'm optimizing a log-likelihood function using the built-in optim()
function with the default method. The model is essentially a Weibull
distribution where the rate parameter changes based on a single covariate,
coded 1 or 0 (that is, when the indicator is 1, the rate parameter changes
to a different value, Beta, where this Beta is included as a model
parameter). I successfully programmed the model and it seems to work fine.
I've done extensive checking, and am confident the model is coded
correctly. But I was experimenting, and ran into something odd: if I set
the entire covariate vector to zero (meaning Beta is always being
multiplied by zero), but leave Beta as a variable to be optimized over,
optim() still changes the value of Beta (even though it's always multiplied
by zero), and furthermore gives different results for the other parameters,
as compared to when I force Beta to be zero and don't optimize over it.

When I first saw this, I imagined I had made a coding error somewhere else.
BUT, here's the kicker: I modified the code so that the log-likelihood
function still includes Beta in the vector of parameters, but draws it
(Beta - parameters[4]) and then immediately after that sets it to zero. So
essentially, Beta is still in the parameters being optimized, but is
immediately set to zero after being called. When I run this model... it
does the same thing!! optim() changes the value of Beta, which then changes
the values of the other parameters. The ONLY DIFFERENCE between this model
and the fixed zero Beta is that the function being optimized still calls
Beta from the parameters vector (but, again, sets it to zero
immediately--Beta is always zero in any of the calculations).

I hope I've explained this clearly. Does anyone have any idea what could be
causing this phenomenon? Specifically, it troubles me that the other
parameters change.

Thanks in advance,
Ryan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SPlus script

2013-06-07 Thread William Dunlap

 Very interesting.  But what explicitly happens with ***source()***
 in Splus???

Here are some examples where I use the exprs argument instead of making a
tempory file and source the file.  I don't know what you are looking for.
(The OP never showed any calls to the functions in defined in his script, so
I don't know what he was doing either.)

  S+  source( exprs = expression( 12:15, invisible(1/9), log2(1:10)), 
auto.print=TRUE)
  S+ 12:15
  [1] 12 13 14 15
  S+ invisible(1/9)
  S+ log2(1:10)
   [1] 0.00 1.00 1.584963 2.00 2.321928 2.584963 2.807355 3.00
   [9] 3.169925 3.321928

  S+ source( exprs = expression( 12:15, invisible(1/9), log2(1:10)), 
auto.print=TRUE, echo=FALSE)
  [1] 12 13 14 15
   [1] 0.00 1.00 1.584963 2.00 2.321928 2.584963 2.807355 3.00
   [9] 3.169925 3.321928

  S+ source( exprs = expression( { 12:15 ; log2(1:10) } ), auto.print=TRUE)
  S+ {
  12:15
  log2(1:10)
  }
   [1] 0.00 1.00 1.584963 2.00 2.321928 2.584963 2.807355 3.00
   [9] 3.169925 3.321928

The next doesn't autoprint because invisible() sets the autprint flag to FALSE 
in the
caller's frame:
  S+ source( exprs = expression( { 12:15 ; invisible(1/9) ; log2(1:10) } ), 
auto.print=TRUE)
  S+ {
  12:15
  invisible(1/9)
  log2(1:10)
  }


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: Rolf Turner [mailto:rolf.tur...@xtra.co.nz]
 Sent: Friday, June 07, 2013 4:46 PM
 To: William Dunlap
 Cc: r-help@r-project.org
 Subject: Re: [R] SPlus script
 
 
 Very interesting.  But what explicitly happens with ***source()***
 in Splus???
 
  cheers,
 
  Rolf
 
 On 08/06/13 11:33, William Dunlap wrote:
foo - function(x) {
sin(42)
x^2
}
 
foo(3)
[1] 9
 
  The value of sin(42) is never seen.
  This is true in both R and S+.  Only the return value of a function is 
  available
  for autoprinting and sin(42) is not the return value.  Perhaps you are 
  remembering
  the case where the last subexpression in the function is an assignment, in 
  which
  case S+ autoprints its value but R does not.
 
 R f - function(x) { y - x + 1 }
 R f(10)
 R .Last.value
 [1] 11
 
 S+ f - function(x) { y - x + 1 }
 Warning messages:
   Last expression in function is an assignment
 (You probably wanted to return the left-hand side)
  in: y - x + 1
 S+ f(10)
 [1] 11
 S+ .Last.value
 [1] 11
 
  Autoprinting is implemented quite differently in R and S+.  I think R 
  attaches
  the autoprint flag to the returned object while S+ assigns .Auto.print-TRUE
  in the frame of the caller (yuck).  That leads to differences like the 
  following
 
 R f - function(x) invisible(x+1)
 R g - function(x) 10 * x
 R g(f(23)) # g() and f() are evaluated in same frame
 [1] 240
 R .Last.value
 [1] 240
 
 S+ f - function(x) invisible(x+1)
 S+ g - function(x) 10 * x
 S+ g(f(23)) # g() and f() are evaluated in same frame
 S+ .Last.value
 [1] 240
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf
  Of Rolf Turner
  Sent: Friday, June 07, 2013 4:17 PM
  To: Duncan Murdoch
  Cc: r-help@r-project.org; Scott Raynaud
  Subject: Re: [R] SPlus script
 
  On 07/06/13 23:05, Duncan Murdoch wrote:
  On 13-06-06 6:22 PM, Rolf Turner wrote:
  On 07/06/13 03:19, Scott Raynaud wrote:
  I actually had tried placing arguments in the call but it didn't
  work.   However, I did
  not think about writing it to a variable and printing.  That seems
  to have done the
  trick.  Funny, I don't remember having to do that before, but that's
  not surprising.
 
  If I remember correctly --- haven't used Splus for decades --- this is a
  difference
  between Splus and R.
 
  In R the output of a function is returned *invisibly* if that function
  is called
  from within another function.  And source() is one such other function.
  Actually this depends on the caller.  source() does return its results
  invisibly, but many other functions don't.
From FAQ 7.16:
  If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the
  command line the returned value is automatically printed (unless it is
  |invisible()|), but in other circumstances, such as in a |source()|d
  file or ***inside a function*** it isn't printed unless you
  specifically print it.
  (Emphasis added.)
 
  I think that you have misinterpreted what I wrote.  Many (most?) functions
  *return* their results (values) visibly.  But if you put an expression
  into the code
  of that function (an expression which is not part of the returned value)
  you never
  see the result of evaluating that expression.
 
  E.g.:
 
foo - function(x) {
sin(42)
x^2
}

[R] recode: how to avoid nested ifelse

2013-06-07 Thread Paul Johnson

In our Summer Stats Institute, I was asked a question that amounts to
reversing the effect of the contrasts function (reconstruct an ordinal
predictor from a set of binary columns). The best I could think of was to
link together several ifelse functions, and I don't think I want to do this
if the example became any more complicated.

I'm unable to remember a less error prone method :). But I expect you might.

Here's my working example code

## Paul Johnson pauljohn at ku.edu
## 2013-06-07

## We need to create an ordinal factor from these indicators
## completed elementary school
es - c(0, 0, 1, 0, 1, 0, 1, 1)
## completed high school
hs - c(0, 0, 1, 0, 1, 0, 1, 0)
## completed college graduate
cg - c(0, 0, 0, 0, 1, 0, 1, 0)

ed - ifelse(cg == 1, 3,
 ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))

edf - factor(ed, levels = 0:3,  labels = c(none, es, hs, cg))
data.frame(es, hs, cg, ed, edf)

## Looks OK, but what if there are missings?
es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
ed - ifelse(cg == 1, 3,
 ifelse(hs == 1, 2,
ifelse(es == 1, 1, 0)))
cbind(es, hs, cg, ed)

## That's bad, ifelse returns NA too frequently.
## Revise (becoming tedious!)

ed - ifelse(!is.na(cg)  cg == 1, 3,
 ifelse(!is.na(hs)  hs == 1, 2,
ifelse(!is.na(es)  es == 1, 1,
   ifelse(is.na(es), NA, 0
cbind(es, hs, cg, ed)


## Does the project director want us to worry about
## logical inconsistencies, such as es = 0 but cg = 1?
## I hope not.

Thanks in advance, I hope you are having a nice summer.

pj

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recode: how to avoid nested ifelse

2013-06-07 Thread Joshua Wiley

Hi Paul,

Unless you have truly offended the data generating oracle*, the
pattern: NA, 1, NA, should be a data entry error --- graduating HS
implies graduating ES, no?  I would argue fringe cases like that
should be corrected in the data, not through coding work arounds.
Then you can just do:

x - do.call(paste0, list(es, hs, cg))

 table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, 
 es,hs, cg)))
none   es   hs   cg
   4112

Cheers,

Josh

*Drawn from comments by Judea Pearl one lively session.


On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote:
 In our Summer Stats Institute, I was asked a question that amounts to
 reversing the effect of the contrasts function (reconstruct an ordinal
 predictor from a set of binary columns). The best I could think of was to
 link together several ifelse functions, and I don't think I want to do this
 if the example became any more complicated.

 I'm unable to remember a less error prone method :). But I expect you might.

 Here's my working example code

 ## Paul Johnson pauljohn at ku.edu
 ## 2013-06-07

 ## We need to create an ordinal factor from these indicators
 ## completed elementary school
 es - c(0, 0, 1, 0, 1, 0, 1, 1)
 ## completed high school
 hs - c(0, 0, 1, 0, 1, 0, 1, 0)
 ## completed college graduate
 cg - c(0, 0, 0, 0, 1, 0, 1, 0)

 ed - ifelse(cg == 1, 3,
  ifelse(hs == 1, 2,
 ifelse(es == 1, 1, 0)))

 edf - factor(ed, levels = 0:3,  labels = c(none, es, hs, cg))
 data.frame(es, hs, cg, ed, edf)

 ## Looks OK, but what if there are missings?
 es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
 hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
 cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
 ed - ifelse(cg == 1, 3,
  ifelse(hs == 1, 2,
 ifelse(es == 1, 1, 0)))
 cbind(es, hs, cg, ed)

 ## That's bad, ifelse returns NA too frequently.
 ## Revise (becoming tedious!)

 ed - ifelse(!is.na(cg)  cg == 1, 3,
  ifelse(!is.na(hs)  hs == 1, 2,
 ifelse(!is.na(es)  es == 1, 1,
ifelse(is.na(es), NA, 0
 cbind(es, hs, cg, ed)


 ## Does the project director want us to worry about
 ## logical inconsistencies, such as es = 0 but cg = 1?
 ## I hope not.

 Thanks in advance, I hope you are having a nice summer.

 pj

 --
 Paul E. Johnson
 Professor, Political Science  Assoc. Director
 1541 Lilac Lane, Room 504  Center for Research Methods
 University of Kansas University of Kansas
 http://pj.freefaculty.org   http://quant.ku.edu

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] splitting a string column into multiple columns faster

2013-06-07 Thread Dimitri Liakhovitski

Hello!

I have a column in my data frame that I have to split: I have to distill
the numbers from the text. Below is my example and my solution.

x-data.frame(x=c(aaa1_bbb1_ccc3,aaa2_bbb3_ccc2,aaa3_bbb2_ccc1))
x
library(stringr)
out-as.data.frame(str_split_fixed(x$x,aaa,2))
out2-as.data.frame(str_split_fixed(out$V2,_bbb,2))
out3-as.data.frame(str_split_fixed(out2$V2,_ccc,2))
result-cbind(x,out2[1],out3)
result
My problem is:
str_split.fixed is relatively slow. In my real data frame I have over
80,000 rows so that it takes almost 30 seconds to run just one line (like
out-... above)
And it's even slower because I have to do it step-by-step many times.

Any way to do it by specifying all 3 delimiters at once
(aaa,_bbb,_ccc) and then split it in one swoop into a data frame with
several columns?

Thanks a lot for any pointers!

-- 
Dimitri Liakhovitski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recode: how to avoid nested ifelse

2013-06-07 Thread Neal Fultz

I would do this to get the highest non-missing level:

x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)

rock chalk...

-nfultz

On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
 Hi Paul,
 
 Unless you have truly offended the data generating oracle*, the
 pattern: NA, 1, NA, should be a data entry error --- graduating HS
 implies graduating ES, no?  I would argue fringe cases like that
 should be corrected in the data, not through coding work arounds.
 Then you can just do:
 
 x - do.call(paste0, list(es, hs, cg))
 
  table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, 
  es,hs, cg)))
 none   es   hs   cg
4112
 
 Cheers,
 
 Josh
 
 *Drawn from comments by Judea Pearl one lively session.
 
 
 On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote:
  In our Summer Stats Institute, I was asked a question that amounts to
  reversing the effect of the contrasts function (reconstruct an ordinal
  predictor from a set of binary columns). The best I could think of was to
  link together several ifelse functions, and I don't think I want to do this
  if the example became any more complicated.
 
  I'm unable to remember a less error prone method :). But I expect you might.
 
  Here's my working example code
 
  ## Paul Johnson pauljohn at ku.edu
  ## 2013-06-07
 
  ## We need to create an ordinal factor from these indicators
  ## completed elementary school
  es - c(0, 0, 1, 0, 1, 0, 1, 1)
  ## completed high school
  hs - c(0, 0, 1, 0, 1, 0, 1, 0)
  ## completed college graduate
  cg - c(0, 0, 0, 0, 1, 0, 1, 0)
 
  ed - ifelse(cg == 1, 3,
   ifelse(hs == 1, 2,
  ifelse(es == 1, 1, 0)))
 
  edf - factor(ed, levels = 0:3,  labels = c(none, es, hs, cg))
  data.frame(es, hs, cg, ed, edf)
 
  ## Looks OK, but what if there are missings?
  es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
  hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
  cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
  ed - ifelse(cg == 1, 3,
   ifelse(hs == 1, 2,
  ifelse(es == 1, 1, 0)))
  cbind(es, hs, cg, ed)
 
  ## That's bad, ifelse returns NA too frequently.
  ## Revise (becoming tedious!)
 
  ed - ifelse(!is.na(cg)  cg == 1, 3,
   ifelse(!is.na(hs)  hs == 1, 2,
  ifelse(!is.na(es)  es == 1, 1,
 ifelse(is.na(es), NA, 0
  cbind(es, hs, cg, ed)
 
 
  ## Does the project director want us to worry about
  ## logical inconsistencies, such as es = 0 but cg = 1?
  ## I hope not.
 
  Thanks in advance, I hope you are having a nice summer.
 
  pj
 
  --
  Paul E. Johnson
  Professor, Political Science  Assoc. Director
  1541 Lilac Lane, Room 504  Center for Research Methods
  University of Kansas University of Kansas
  http://pj.freefaculty.org   http://quant.ku.edu
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://joshuawiley.com/
 Senior Analyst - Elkhart Group Ltd.
 http://elkhartgroup.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a string column into multiple columns faster

HI,
May be this helps:

res-data.frame(x=x,read.table(text=gsub([A-Za-z],,x[,1]),sep=_,header=FALSE),stringsAsFactors=FALSE)
res
#   x V1 V2 V3
#1 aaa1_bbb1_ccc3  1  1  3
#2 aaa2_bbb3_ccc2  2  3  2
#3 aaa3_bbb2_ccc1  3  2  1
A.K.

- Original Message -
From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Friday, June 7, 2013 9:24 PM
Subject: [R] splitting a string column into multiple columns faster

Hello!

I have a column in my data frame that I have to split: I have to distill
the numbers from the text. Below is my example and my solution.

x-data.frame(x=c(aaa1_bbb1_ccc3,aaa2_bbb3_ccc2,aaa3_bbb2_ccc1))
x
library(stringr)
out-as.data.frame(str_split_fixed(x$x,aaa,2))
out2-as.data.frame(str_split_fixed(out$V2,_bbb,2))
out3-as.data.frame(str_split_fixed(out2$V2,_ccc,2))
result-cbind(x,out2[1],out3)
result
My problem is:
str_split.fixed is relatively slow. In my real data frame I have over
80,000 rows so that it takes almost 30 seconds to run just one line (like
out-... above)
And it's even slower because I have to do it step-by-step many times.

Any way to do it by specifying all 3 delimiters at once
(aaa,_bbb,_ccc) and then split it in one swoop into a data frame with
several columns?

Thanks a lot for any pointers!

-- 
Dimitri Liakhovitski

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] recode: how to avoid nested ifelse

2013-06-07 Thread Joshua Wiley

I still argue for na.rm=FALSE, but that is cute, also substantially faster

f1 - function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
f2 - function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
f3 - function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
f4 - function(x1, x2, x3) rowSums(cbind(x1, x2, x3))

es - rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
hs - rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
cg - rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)

system.time(replicate(1000, f1(cg, hs, es)))
system.time(replicate(1000, f2(cg, hs, es)))
system.time(replicate(1000, f3(cg, hs, es)))
system.time(replicate(1000, f4(cg, hs, es)))

 system.time(replicate(1000, f1(cg, hs, es)))
   user  system elapsed
  22.730.03   22.76
 system.time(replicate(1000, f2(cg, hs, es)))
   user  system elapsed
   0.920.040.95
 system.time(replicate(1000, f3(cg, hs, es)))
   user  system elapsed
   0.190.020.20
  system.time(replicate(1000, f4(cg, hs, es)))
   user  system elapsed
   0.950.030.98


R version 3.0.0 (2013-04-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)




On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz nfu...@gmail.com wrote:
 I would do this to get the highest non-missing level:

 x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)

 rock chalk...

 -nfultz

 On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
 Hi Paul,

 Unless you have truly offended the data generating oracle*, the
 pattern: NA, 1, NA, should be a data entry error --- graduating HS
 implies graduating ES, no?  I would argue fringe cases like that
 should be corrected in the data, not through coding work arounds.
 Then you can just do:

 x - do.call(paste0, list(es, hs, cg))

  table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, 
  es,hs, cg)))
 none   es   hs   cg
4112

 Cheers,

 Josh

 *Drawn from comments by Judea Pearl one lively session.


 On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote:
  In our Summer Stats Institute, I was asked a question that amounts to
  reversing the effect of the contrasts function (reconstruct an ordinal
  predictor from a set of binary columns). The best I could think of was to
  link together several ifelse functions, and I don't think I want to do this
  if the example became any more complicated.
 
  I'm unable to remember a less error prone method :). But I expect you 
  might.
 
  Here's my working example code
 
  ## Paul Johnson pauljohn at ku.edu
  ## 2013-06-07
 
  ## We need to create an ordinal factor from these indicators
  ## completed elementary school
  es - c(0, 0, 1, 0, 1, 0, 1, 1)
  ## completed high school
  hs - c(0, 0, 1, 0, 1, 0, 1, 0)
  ## completed college graduate
  cg - c(0, 0, 0, 0, 1, 0, 1, 0)
 
  ed - ifelse(cg == 1, 3,
   ifelse(hs == 1, 2,
  ifelse(es == 1, 1, 0)))
 
  edf - factor(ed, levels = 0:3,  labels = c(none, es, hs, cg))
  data.frame(es, hs, cg, ed, edf)
 
  ## Looks OK, but what if there are missings?
  es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
  hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
  cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
  ed - ifelse(cg == 1, 3,
   ifelse(hs == 1, 2,
  ifelse(es == 1, 1, 0)))
  cbind(es, hs, cg, ed)
 
  ## That's bad, ifelse returns NA too frequently.
  ## Revise (becoming tedious!)
 
  ed - ifelse(!is.na(cg)  cg == 1, 3,
   ifelse(!is.na(hs)  hs == 1, 2,
  ifelse(!is.na(es)  es == 1, 1,
 ifelse(is.na(es), NA, 0
  cbind(es, hs, cg, ed)
 
 
  ## Does the project director want us to worry about
  ## logical inconsistencies, such as es = 0 but cg = 1?
  ## I hope not.
 
  Thanks in advance, I hope you are having a nice summer.
 
  pj
 
  --
  Paul E. Johnson
  Professor, Political Science  Assoc. Director
  1541 Lilac Lane, Room 504  Center for Research Methods
  University of Kansas University of Kansas
  http://pj.freefaculty.org   http://quant.ku.edu
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://joshuawiley.com/
 Senior Analyst - Elkhart Group Ltd.
 http://elkhartgroup.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

[R] Regression Tolerance Intervals - Dr. Young's Code

2013-06-07 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Hello,

Below is a reproducible example to generate the output by using Dr. Young's R 
code on the above subject .   As commented below, the issue is that  part of 
the code (regtol.int and plottol) does not seem to work.

I would appreciate receiving your advice toward resolving the issue.

Thanks and regards,

Pradip Muhuri


setwd (E:/)
require (tolerance)

d1- xlndur  ylnant
8.910797  0.33901690
9.001415  0.36464311
8.983936  0.53976194
8.948035  0.33901690
9.056784  0.39266961
9.018593  0.18617770
9.001415  0.53976194
8.983936 -0.11005034
8.966147  0.53102826
8.948035  0.59885086
6.90  NA

xd1 - read.table(textConnection(d1), header=TRUE, as.is=TRUE)
print (xd1); str (xd1)

#This code works
xout1 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1),  alpha=.05, 
P=0.99, side=2)
print (xout1)


#This code does not work
xout2 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), new.xlndur = 
NULL,  alpha=.05, P=0.99, side=2)
print (xout2)

#This code does not work
plottol(xout1, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )

#This code does not work
plottol(xout2, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y )


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting a string column into multiple columns faster