Re: [R] nice report generator?

2011-12-07 Thread Abhijit Dasgupta
Sarah Goslee might want to chime in, but using odfWeave and appropriate 
LibreOffice templates, you can generate beautifully formatted tables, 
possibly in the style you wish, in LibreOffice, as well as add R figures.

The R2wd package (which has a proprietary component) will also generate 
tables directly in Word (on windows machines) and their default theme is 
pretty attractive to me. I'm pretty sure even that is customizable. You 
can also put R plots directly into Word that way as well.


On 12/7/2011 3:21 PM, Michael wrote:
> Thanks a lot Duncan!
> I did some home-work and found out that in terms of table looks, it's
> neater to generate Excel 2010 style colorful tables, not the Latex
> style plain/math-geek tables...
> Therefore, a report generator would hopefully generate Excel 2010
> style tables, plus R plots, etc.
> Any thoughts?
> Thanks a lot!
> On 12/7/11, Duncan Murdoch  wrote:
>> On 07/12/2011 1:14 PM, Michael wrote:
>>> Hi all,
>>> I am looking for recommendations/pointers about best report generator you
>>> think that are currently available?
>>> i.e. the package that can help turn console output into nice-looking neat
>>> report to send to bosses?
>> You might find the latex() command in Hmisc or the equivalent in the
>> xtable package does what you want.  For simple reports in a single table
>> you don't need Sweave; for more complex ones it would make your life easier.
>> My tables package (now on CRAN) might simplify the production of the
>> table.   It is set up to work with Hmisc currently.
>> Duncan Murdoch
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Proc Mixed to R

2011-10-28 Thread Abhijit Dasgupta
You need to use either the lme4 or nlme packages for mixed models. 
(There are some other possibilities as well). See for MUCH more detail

On 10/27/2011 7:19 PM, Molly Hanlon wrote:
> Hi All,
> I'm working with some SAS code to analyze an experiment set up as follows:
> 66 subjects (colonies) treated with a random treatment (1-8) and measured at
> three time points.
> The data structure looks like:
>  input colony tmt y1 y2 y3;
>  y=y1; date=*1*; output;
>  y=y2; date=*2*; output;
>  y=y3; date=*3*; output;
>  datalines;
> 1  3  6725   6750   925
> 2  8  6950   5800   11275
> 3  4  4200   6100   6475
> Procedure:
> *proc* *mixed* data=Nosema method=ml covtest;
>  class colony tmt;
>  model y=tmt date tmt*date / s;
>  repeated / type=un subject=colony;
>  random colony;
>  lsmeans tmt/cl adjust=tukey;
> I am able to get something close by running aov on it, even closer by using
> Anova{car} and calling type=3.  The problem I'm having is running the tukey
> and/or getting something similar to SAS's "Solution for Fixed Effects"
> table.  Any idea what to do?
> Thanks,
> Molly
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to turn a LaTeX Sweave file (Rnw) into .HTML/.odf/.docx? (under windows)

2011-09-22 Thread Abhijit Dasgupta
So, I was playing around a bit on my Mac 

LyX can do Sweave (see, and will 
actually output HTML or ODT. However, on a cursory pass, I couldn't get the 
graphics to translate, since the Sweave driver translates the graphics as .ps 
or .pdf files, and I'm not sure the HTML translator looks for them or can 
accept them. The math actually translates nicely. So there is probably 
potential here.


On Sep 22, 2011, at 7:09 PM, Tal Galili wrote:

> Hello dear R help members,
> I have found several references on how to do this, my question is if anyone
> is actually using them - and if there are some strong points on what to use,
> and how well it is working out.
> My goal is to be able to easily create docs from R, but to be able to share
> it with other researchers (who do not use LaTeX) so they could easily
> copy/paste the tables and edit them for their needs (pdf is not solving this
> for me).
> The only reasonable solution I came by so far is to use HTML markup coupled
> with R2HTML (or odfWeave or R2wd).  But nothing that can work with
> LaTeX->HTML (easily)
> I have asked a similar question here:
> And also noticed it was asked half a year ago here:
> The general issue of TeX to HTML was discussed also in these places:
> And obviously the following page offers other good resources to consider:
> p.s: I search the R-help for this topic, but "sweave html" didn't seem to
> yield good results - my apologies if this has been heavily debated before -
> links would be welcomed as well.
> Tal
> Contact
> Details:---
> Contact me: |  972-52-7275845
> Read me: (Hebrew) | (Hebrew) |
> (English)
> --
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hints for Data Mining

2011-09-15 Thread Abhijit Dasgupta
Please see the R Machine Learning Task View 
( for a 
starting point on decision trees.

On 9/14/2011 7:11 PM, Lorenzo Isella wrote:
> Dear All,
> I am recycling a previous email of mine where I asked some questions 
> about clustering mixed numerical/categorical data. This time I am more 
> into data mining. I am given a set of known statistical indexes {s_i}, 
> i=1,2...N for a N countries. These indexes in general are a both 
> numerical and categorical variables. For each country, I also have a 
> property x_i whose value is known, but that I also would like to be 
> able to predict correctly using a model. This is needed in order to 
> assess the importance of the various indexes in determining {x_i}.
> There are two cases of interest
> (1) all the {x_i} are numerical variables, e.g. the average life 
> expectancy
> (2) all the {x_i} are categorical variables (e.g. the fact that the 
> country joins treaty A, B or C). This reminds me of discrete choice 
> models.
> Any suggestions about how to tackle this problems? In the past I used 
> mclust, but it is limited to all the {s_i} being numerical variables.
> I saw an example of the use of glm for predicting binary variables
> which may be relevant for (2). In general I know that some people use 
> Weka for this sort of tasks, but I wonder if I can use R to get a 
> decision tree and a confusion matrix and to be able to predict how the 
> {x_i} would change by varying the value of one statistical index.
> Many thanks for your suggestions
> Lorenzo
> __
> mailing list
> PLEASE do read the posting guide 
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Avoiding for Loop for moving average

2011-09-02 Thread Abhijit Dasgupta
There is a recent blog post by Dirk Eddelbeutel on how to do something 
similar using his Rcpp package and C++, with massive time improvements.

On 9/2/2011 12:43 PM, Noah Silverman wrote:
> Hello,
> I need to calculate a moving average and an exponentially weighted moving 
> average over a fairly large data set (500K rows).
> Doing this in a for loop works nicely, but is slow.
> ewma<- data$col[1]
> N<- dim(data)[1]
> for(i in 2:N){
>   data$ewma<- alpha * data$ewma[i-1] + (1-alpha) * data$value[i]
> }
> Since the moving average "accumulates" as we move through the data, I'm not 
> sure on the best/fastest way to do this.
> Does anyone have any suggestions on how to avoid a loop doing this?
> --
> Noah Silverman
> UCLA Department of Statistics
> 8117 Math Sciences Building #8208
> Los Angeles, CA 90095
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Opposite of paste function

2011-08-10 Thread Abhijit Dasgupta


On 8/10/2011 2:23 PM, Peter Langfelder wrote:
> On Wed, Aug 10, 2011 at 11:22 AM, Soyeon Kim  wrote:
>> Dear All,
>> I have vn variable
>>> vn
>> [1] "V300" "V376"
>> What I want to get is
>> 300 376
> as.numeric(substring(vn, 2))
> Peter
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Storing and managing custom R functions for re-use

2011-07-09 Thread Abhijit Dasgupta, PhD
I think most of us are in a similar situation. I've usually kept mine in 
a file which is sourced when I start R. The main problem I have with 
this is that it clutters up my environment with a lot of stuff I don't 
need all the time. I'm in the process of creating a custom package which 
will be lazy-loaded. I believe a previous discussion of this topic 
suggested this as the preferred method.

On 07/09/2011 07:30 AM, Simon Chamaillé-Jammes wrote:

Dear all,

sorry if this is a bit on the sidetrack for R-help.

As a regular R user I have developed quite a lot of custom R 
functions, to the point of not always remembering what I have already 
programmed, where the file is and so on.
I was wondering what other people do in this regards. A basic file 
with all your functions, or a custom R package, or directly integrated 
into a profile file ??? I'm considering that a blog with tagged posts 
may be a good solution (and really good ones could join R-bloggers 

If someone is happy to share what (s)he considers good practice, thanks.


__ mailing list
PLEASE do read the posting guide

and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence bands in ggplot2

2011-07-07 Thread Abhijit Dasgupta
It's basically a question of layering, and the order in which the layers are 
drawn. Draw the pointranges first and then the points:

qplot(x=as.factor(sch), y=est, ymin=lower.95ci, ymax=upper.95ci, 
geom_point(aes(x=as.factor(sch), y=est), color='red')+...

On Jul 7, 2011, at 6:16 PM, Christopher Desjardins wrote:

> Thanks that worked perfectly. One thing if I may. Is it possible to make the 
> center dot red and the lines connecting the dots black?
> Thanks,
> Chris
> On Jul 7, 2011, at 5:10 PM, Abhijit Dasgupta, PhD wrote:
>> You can easily do this by:
>> qplot(x=as.factor(sch),y=est, geom='point', colour='red') +
>> geom_pointrange(aes(x=as.factor(sch), y=est, ymin=lower.95ci, 
>> ymax=upper.95ci))+
>> xlab('School') + ylab("Value-added")+theme_bw()
>> On 07/07/2011 05:55 PM, Christopher Desjardins wrote:
>>> Hi,
>>> I have the following data:
>>>> est
>>> sch190  sch107  sch290  sch256  sch287  sch130  
>>> sch139
>>> 4.16656026  2.64306071  4.22579866  6.12024789  4.49624748 11.12799127  
>>> 1.17353917
>>> sch140  sch282  sch161  sch193  sch156  sch288  
>>> sch352
>>> 3.48197696 -0.29659410 -1.99194986 10.23489859  7.77342138  6.77624539  
>>> 9.66795001
>>> sch368  sch225  sch301  sch105  sch353  sch291  
>>> sch179
>>> 7.20229569  4.41989204  5.61586860  5.99460203 -2.65019242 -9.42614560 
>>> -0.25874193
>>> sch134  sch135  sch324  sch360 bb1
>>> 3.26432479 10.52555091 -0.09637968  2.49668858 -3.24173545
>>>> se
>>>   sch190sch107sch290sch256sch287sch130sch139
>>> sch140
>>> 3.165127  3.710750  4.680911  6.335386  3.896302  4.907679  4.426284  
>>> 4.266303
>>>   sch282sch161sch193sch156sch288sch352sch368
>>> sch225
>>> 3.303747  4.550193  3.995261  5.787374  5.017278  7.820763  7.253183  
>>> 4.483988
>>>   sch301sch105sch353sch291sch179sch134sch135
>>> sch324
>>> 4.076570  7.564359 10.456522  5.705474  4.247927  5.671536 10.567093  
>>> 4.138356
>>>   sch360   bb1
>>> 4.943779  1.935142
>>>> sch
>>> [1] "190" "107" "290" "256" "287" "130" "139" "140" "282" "161" "193" "156" 
>>> "288"
>>> [14] "352" "368" "225" "301" "105" "353" "291" "179" "134" "135" "324" 
>>> "360" "BB"
>>> From this data I have created 95% confidence intervals assuming a normal 
>>> distribution.
>>> lower.95ci<- est - se*qnorm(.975)
>>> upper.95ci<- est + se*qnorm(.975)
>>> What I'd like to do is plot the estimate (est) and have lines attach to the 
>>> points located in lower.95ci and upper.95ci.  Presently I am doing the 
>>> following:
>>> qplot(x=as.factor(sch),y=lower.95ci) + 
>>> geom_point(aes(x=as.factor(sch),y=upper.95ci),colour="black") + 
>>> geom_point(aes(x=as.factor(sch), y=est),colour="red") + ylab("Value-Added") 
>>> + xlab("School") + theme_bw()
>>> Which creates this graph --->   
>>> That's fine except that it doesn't connect the points vertically. Does 
>>> anyone know how I could make the 'black' points connect to the 'red' point, 
>>> i.e. show confidence bands?
>>> Thanks,
>>> Chris
>>> [[alternative HTML version deleted]]
>>> __
>>> mailing list
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence bands in ggplot2

2011-07-07 Thread Abhijit Dasgupta, PhD

You can easily do this by:

qplot(x=as.factor(sch),y=est, geom='point', colour='red') +
geom_pointrange(aes(x=as.factor(sch), y=est, ymin=lower.95ci, ymax=upper.95ci))+
xlab('School') + ylab("Value-added")+theme_bw()

On 07/07/2011 05:55 PM, Christopher Desjardins wrote:

I have the following data:


  sch190  sch107  sch290  sch256  sch287  sch130  
  4.16656026  2.64306071  4.22579866  6.12024789  4.49624748 11.12799127  
  sch140  sch282  sch161  sch193  sch156  sch288  
  3.48197696 -0.29659410 -1.99194986 10.23489859  7.77342138  6.77624539  
  sch368  sch225  sch301  sch105  sch353  sch291  
  7.20229569  4.41989204  5.61586860  5.99460203 -2.65019242 -9.42614560 
  sch134  sch135  sch324  sch360 bb1
  3.26432479 10.52555091 -0.09637968  2.49668858 -3.24173545


  3.165127  3.710750  4.680911  6.335386  3.896302  4.907679  4.426284  4.266303
  3.303747  4.550193  3.995261  5.787374  5.017278  7.820763  7.253183  4.483988
  4.076570  7.564359 10.456522  5.705474  4.247927  5.671536 10.567093  4.138356
sch360   bb1
  4.943779  1.935142


  [1] "190" "107" "290" "256" "287" "130" "139" "140" "282" "161" "193" "156" 
[14] "352" "368" "225" "301" "105" "353" "291" "179" "134" "135" "324" "360" 

 From this data I have created 95% confidence intervals assuming a normal 

lower.95ci<- est - se*qnorm(.975)
upper.95ci<- est + se*qnorm(.975)

What I'd like to do is plot the estimate (est) and have lines attach to the 
points located in lower.95ci and upper.95ci.  Presently I am doing the 

qplot(x=as.factor(sch),y=lower.95ci) + geom_point(aes(x=as.factor(sch),y=upper.95ci),colour="black") + 
geom_point(aes(x=as.factor(sch), y=est),colour="red") + ylab("Value-Added") + 
xlab("School") + theme_bw()

Which creates this graph --->

That's fine except that it doesn't connect the points vertically. Does anyone 
know how I could make the 'black' points connect to the 'red' point, i.e. show 
confidence bands?


[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trying to import xls or xlsx files

2011-06-23 Thread Abhijit Dasgupta
Gabor's answer explains the error perfectly. You might want to look at the xlsx 
package as well as the RODBC package if you're on Windows. RODBC is really 
fast, if you can use it. 


On Jun 23, 2011, at 2:00 PM, wwreith  wrote:

> library(xlsReadWrite)
> mydata<-read.xls("file path", header=TRUE)
> however if I change xls to csv it works just fine. Any ideas what I'm doing
> wrong? I have have also using the package gdata with the exact same error.
> Below is the error that pops up.
> Error in findPerl(verbose = verbose) : 
>  perl executable not found. Use perl= argument to specify the correct path.
> Error in file.exists(tfn) : invalid 'file' argument
> --
> View this message in context: 
> Sent from the R help mailing list archive at
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] new to R need urgent help!

2011-06-23 Thread Abhijit Dasgupta
On Jun 23, 2011, at 4:42 PM, elisheva corn  wrote:

> hi all-
> I am doing some research, have never used R before until today and need to
> understand the following program for a project.
> if some one could PLEASE help me understand this program ASAP i would
> GREATLY appreciate it (any syntax/ statistic comments would be great)
> -on a side note, it seems to me that R doesnt include the pv, and it was
> calculated seperatly, is this true?
> fit=gee(foci~as.factor(time)*cond,id=exper,data=drt,family=poisson(link =
> "log"))
You apparently have count data (foci) which is measured repeatedly within 
exper, and you're interested in how foci changes with time and condition 
including their interaction. The code fits a generalized estimating equation 
(GEE) model, which can be an appropriate model for repeated measures data. See, 
for example, Diggle, Liang, Zeger & Heagerty for background. 
> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> running glm to get initial regression estimate
>  (Intercept) as.factor(time)24
> 3.051177 -2.705675
>  condHypoxia as.factor(time)24:condHypoxia
>-0.402259  1.429034
>> pv=2*(1-pnorm(abs(summary(fit)$coef[,5])))
>> data.frame(summary(fit)$coef,pv)

The gee package doesn't compute the value directly, though other functions like 
lm, glm and others do.  What the code does is use the robust z statistic, which 
is the estimate/robust se, and relate it to the standard normal distribution. 
>   Estimate Naive.S.E.   Naive.z Robust.S.E.
> Robust.z
> (Intercept)3.051177 0.02221052 137.37527  0.04897055
> 62.306363
> as.factor(time)24 -2.705675 0.10890056 -24.84537  0.19987174
> -13.537057
> condHypoxia   -0.402259 0.03907961 -10.29332  0.10661248
> -3.773095
> as.factor(time)24:condHypoxia  1.429034 0.12549576  11.38711  0.17867421
> 7.997988
> (Intercept)   0.00e+00
> as.factor(time)24 0.00e+00
> condHypoxia   1.612350e-04
> as.factor(time)24:condHypoxia 1.332268e-15
>> ftable(table(drt$cond,drt$time,predict(fit)))
> 0.345501643340608 1.37227675004058 2.64891772174934
> 3.05117673373261
> Oxia0.5  00
> 0  485
>24 3150
> 00
> Hypoxia 0.5  00
> 3460
>24   0  449
> 00
>> ## 3-th term gives the difference between the Hypoxia/Oxia at time=0.5
>> ## the difference between Hypoxia/Oxia at time=24
>> L=matrix(c(0,0,1,1),nrow=1)
>> fit$coef[L==1]
>  condHypoxia as.factor(time)24:condHypoxia
>-0.402259  1.429034
>> L%*%fit$coef
> [,1]
> [1,] 1.026775
>> wald.test(fit$robust.variance,fit$coef,L=L)
> Wald test:
> --
> Chi-squared test:
> X2 = 23.8, df = 1, P(> X2) = 1.1e-06
>[[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summarize by two or more attributes

2011-05-17 Thread Abhijit Dasgupta
One possibility is:

summaryBy(Rate~Source+Bin, data=Df, FUN=sum)

On 5/17/2011 12:48 PM, LCOG1 wrote:
> Okay everyone heres a likely softball for someone.
> Consider the following data frame:
> #Create data
> x<-rep(c(1,15),10)
> y<-rnorm(20)
> z<-c(rep("auto",10),rep("bus",10))
> a<-rep(c(1,1,2,2,3,3,4,4,5,5),2)
> #Create Data frame
> Df<-data.frame(Source=x,Rate=y,Bin=a,Type=z)
> I want to create a new column the equals the sum of the Rates for each type
> (1,15) by Bin.
> A related question:  I have been using R for a while now and usually
> manipulate my data in data frames but i know lists are better for R so
> perhaps the above should be done using lists.  Feel free to offer
> suggestions coming from that angle.
> Thanks guys
> JR-
> --
> View this message in context: 
> Sent from the R help mailing list archive at
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combine lattice plot and standard R plot

2011-05-04 Thread Abhijit Dasgupta

All of the components in the grid.arrange statement are from ggplot2 and 
lattice, which are both in turn based on the grid package. What 
grid.arrange is able to do is use the grid framework to arrange the 
individual plots on a page. The base graphics are not based on grid, and 
so won't work (afaik) with grid.arrange

On 5/4/2011 2:47 PM, Scott Chamberlain wrote:
> What about the example in gridExtra package:
> require(ggplot2); require(lattice); require(gridExtra)
> grid.arrange(qplot(1:10), xyplot(1:10~1:10), tableGrob(head(iris)), nrow=2, 
> as.table=TRUE, main="test main", sub=textGrob("test sub", gp=gpar(font=2)))
> On Wednesday, May 4, 2011 at 1:44 PM, Jonathan Daily wrote:
> If you read the help documentation, lattice is not really compatible
>> with standard graphics.
>> library("lattice")
>> ?lattice
>> 2011/5/4 Lucia Cañas:
>>> Dear R users,
>>> I would like to combine lattice plot (xyplot) and standard R plot (plot and 
>>> plotCI) in an unique figure.
>>> I use the function "par()" to combine plot and plotCI and I use the 
>>> function "print()" to combine xyplot. I tried to use these functions to 
>>> combine xyplot and plotCI and plots but they do not work. Does anybody know 
>>> how I can do this?
>>> Thank you very much in advance.
>>> Lucía Cañás Ferreiro
>>> Instituto Español de Oceanografía
>>> Centro Oceanográfico de A coruña
>>> Paseo Marítimo Alcalde Francisco Vázquez, 10
>>> 15001 - A Coruña, Spain
>>> Tel: +34 981 218151 Fax: +34 981 229077
>>> [[alternative HTML version deleted]]
>>> __
>>> mailing list
>>> PLEASE do read the posting guide
>>> and provide commented, minimal, self-contained, reproducible code.
>> -- 
>> ===
>> Jon Daily
>> Technician
>> ===
>> #!/usr/bin/env outside
>> # It's great, trust me.
>> __
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear regression in a data.frame using recast -- A fortunes candidate??

2011-03-16 Thread Abhijit Dasgupta, PhD


On 03/16/2011 05:37 PM, Bert Gunter wrote:

Ha! -- A fortunes candidate?
-- Bert

If this is really a time series, then you will have serious validity
problems due to auto-correlation among non-independent units. (But if you
are just searching for a way to pull the wool over the eyes of the
statistically uninformed, then I guess there's no stopping you.)


David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] value of W seems to be suspicious in the mann-whitney wilcox related test. what could be the problem

2011-03-14 Thread Abhijit Dasgupta
You need to read up on the Wilcoxon signed-rank test and the output from 

The confidence interval is of the difference of medians, which can 
certainly be negative. In fact, your estimate is -33, and the confidence 
interval is (-68, 0) which is reasonable.

The value of W is a positive number, in general, and isn't restricted to 

> On 3/14/2011 11:07 AM, taby gathoni wrote:
>> my output is as follows:
>>   wilcox.test(main_samp$SCORE~main_samp$GENDER, = TRUE)
>>  Wilcoxon rank sum test with continuity correction
>> data:  main_samp$SCORE by main_samp$GENDER
>> W = 2780.5, p-value = 0.04829
>> alternative hypothesis: true location shift is not equal to 0
>> 95 percent confidence interval:
>>   -6.85e+01 -2.056837e-05
>> sample estimates:
>> difference in location
>>   -33.3
>>   result of W seems suspicious since i expect the result to be between 0 and 
>> 1.
>> and the confidence intervals are also -ves  what could be the challenge?
>> Thanks Taby
>>  [[alternative HTML version deleted]]
>> __
>>  mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] value of W seems to be suspicious in the mann-whitney wilcox related test. what could be the problem

2011-03-14 Thread Abhijit Dasgupta
You need to read up on the Wilcoxon signed-rank test and the output from 

The confidence interval is of the difference of medians, which can 
certainly be negative. In fact, your estimate is -33, and the confidence 
interval is (-68, 0) which is reasonable.

The value of W is a positive number, in general, and isn't restricted to 


On 3/14/2011 11:07 AM, taby gathoni wrote:
> my output is as follows:
>   wilcox.test(main_samp$SCORE~main_samp$GENDER, = TRUE)
>  Wilcoxon rank sum test with continuity correction
> data:  main_samp$SCORE by main_samp$GENDER
> W = 2780.5, p-value = 0.04829
> alternative hypothesis: true location shift is not equal to 0
> 95 percent confidence interval:
>   -6.85e+01 -2.056837e-05
> sample estimates:
> difference in location
>   -33.3
>   result of W seems suspicious since i expect the result to be between 0 and 
> 1.
> and the confidence intervals are also -ves  what could be the challenge?
> Thanks Taby
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rstudio question

2011-03-04 Thread Abhijit Dasgupta
Seconded. Go to the support forum at and post your 
question/bug/suggestion. Those folks have been excellent in their 
response times and feedback.

On 3/4/2011 9:14 AM, Shige Song wrote:
> Why don't you post the question to the RStudio support forum? The
> folks there are quite responsive and very helpful.
> Shige
> On Fri, Mar 4, 2011 at 9:05 AM, Robert Kinley  wrote:
>>   I really like RStudio ...
>> ... but I wish it wouldn't automatically reload the last .RData it had.
>> Anyone know how to fix this ... ?
>> Also - does anyone know is there an Rstudio-user email-list forum thingy
>> out there ?
>> ta.
>>  Robert Kinley
>> [[alternative HTML version deleted]]
>> __
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot with same font like in LaTeX

2011-03-02 Thread Abhijit Dasgupta
The tikzDevice package can do this.

On 3/2/2011 6:48 AM, Jonas Stein wrote:
> Hi,
> i want to make my plots look uniform in LaTeX documents.
> - usage of the same font on axes and in legend like LaTeX uses
>(for example "Computer Modern")
> - put real LaTeX formulas on the axes
> Have you any hints how i can achieve that?
> I had no luck two years ago, but i want to try it again now.
> kind regards,

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Revolution Analytics reading SAS datasets

2011-02-11 Thread Abhijit Dasgupta, PhD

I'm sure the legal ground is tricky. However, OpenOffice and LibreOffice 
and KWord have been able to open the (proprietary) MS Word doc format 
for a while now, and they are open source (and Libre Office might even 
be GPL'd), so the algorithm is in fact "published" in Jeremy's sense, 
and has been for several years. I figure the reason for keeping the SAS 
reading functionality proprietary is Revolution's (perfectly legitimate) 
wish to make money by separating their product from GNU R and adding 
features that would make people want to buy rather than just download 
from CRAN.

Within GNU R there are of course sas.get in the Hmisc package (which 
requires SAS). It should also be quite easy to write a wrapper around 
dsread, a command-line closed source product freely downloadable in a 
limited form which will convert sas7bdat files to csv or tsv format (and 
SQL if you pay). This latter path won't require SAS locally.

I'm also sure that SAS has a way to export its datasets into R, since 
the current version of IML Studio will in fact interact with R.

On 02/10/2011 03:11 PM, Jeremy Miles wrote:

On 10 February 2011 12:01, Matt Shotwell  wrote:

On Thu, 2011-02-10 at 10:44 -0800, David Smith wrote:

The SAS import/export feature of Revolution R Enterprise 4.2 isn't
open-source, so we can't release it in open-source Revolution R
Community, or to CRAN as we do with the ParallelR packages (foreach,
doMC, etc.).

Judging by the language of Dr. Nie's comments on the page linked below,
it seems unlikely this feature is the result of a licensing agreement
with SAS. Is that correct?

There was some discussion of this on the SAS email list.  People who
seem to know what they were talking about said that they would have
had to reverse engineer it to decode the file format.  It's slightly
tricky legal ground - the file format can't be copyrighted but
publishing the algorigthm might not be allowed.  I guess if they
release it as open source, that could be construed as publishing the
algorithm. (SPSS and WPS both can open SAS files, and I'd be surprised
if SAS licensed to them.  [Esp WPS, who SAS are (or were) suing for
all kinds of things in court in London.)


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to reshape wide format data.frame to long format?

2011-01-20 Thread Abhijit Dasgupta
As for your second question, you could certainly do

newcodesM = transform(newcodesM, variable1 = 
unlist(strsplit(variable,'\\.'))[1], variable2 = unlist(strsplit(variable, 
'\\.'))[2], variable3 = unlist(strsplit(variable,'\\.'))[3])

though I'm sure there is a more efficient use of strsplit in this context. 

On Jan 20, 2011, at 10:51 AM, Fredrik Karlsson wrote:

> Dear list,
> I need to convert this data.frame
>> names(codesM)
> [1] "key""AMR.pa1.M"  "AMR.pa2.M"  "AMR.pa3.M"  "AMR.pa4.M"
> [6] "AMR.pa5.M"  "AMR.pa6.M"  "AMR.pa7.M"  "AMR.pa8.M"  "AMR.pa9.M"
> [11] "AMR.pa10.M" "AMR.ta1.M"  "AMR.ta2.M"  "AMR.ta3.M"  "AMR.ta4.M"
> [16] "AMR.ta5.M"  "AMR.ta6.M"  "AMR.ta7.M"  "AMR.ta8.M"  "AMR.ta9.M"
> [21] "AMR.ta10.M" "AMR.ka1.M"  "AMR.ka2.M"  "AMR.ka3.M"  "AMR.ka4.M"
> [26] "AMR.ka5.M"  "AMR.ka6.M"  "AMR.ka7.M"  "AMR.ka8.M"  "AMR.ka9.M"
> [31] "AMR.ka10.M" "SMR.pa1.M"  "SMR.pa2.M"  "SMR.pa3.M"  "SMR.pa4.M"
> [36] "SMR.pa5.M"  "SMR.pa6.M"  "SMR.pa7.M"  "SMR.pa8.M"  "SMR.pa9.M"
> [41] "SMR.pa10.M" "SMR.ta1.M"  "SMR.ta2.M"  "SMR.ta3.M"  "SMR.ta4.M"
> [46] "SMR.ta5.M"  "SMR.ta6.M"  "SMR.ta7.M"  "SMR.ta8.M"  "SMR.ta9.M"
> [51] "SMR.ta10.M" "SMR.ka1.M"  "SMR.ka2.M"  "SMR.ka3.M"  "SMR.ka4.M"
> [56] "SMR.ka5.M"  "SMR.ka6.M"  "SMR.ka7.M"  "SMR.ka8.M"  "SMR.ka9.M"
> [61] "SMR.ka10.M"
>> dim(codesM)
> [1] 42 61
> into a 3 x  2501 data.frame where the "key" variable is kept, the
> values in columns 2-61 above is inserted into a "values" column and
> the name of the column is inserted in a third column ("variable"
> perhaps).
> Like
> key variable  value
> POSTOFF_1_1AMR.pa1.M   5
> POSTOFF_1_1AMR.pa2.M   3
> I think I should be able to do this using the "reshape" function, but
> I cannot get it to work. I think I need some help to understand
> this...
> (If I could split the "variable" into three separate columns splitting
> by ".", that would be even better.)
> I appreciate all the help I could get.
> /Fredrik
> -- 
> "Life is like a trumpet - if you don't put anything into it, you don't
> get anything out of it."
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to reshape wide format data.frame to long format?

2011-01-20 Thread Abhijit Dasgupta
I would think that the following code should work:

newcodesM = reshape(codesM, id=1)

If other variables in the data.frame are factors, reshape thinks all of them 
are ID variables and tries to use all of them as "keys". Specifying the id 
variable you want to keep (I used id=1 since "key" is in the 1st column) will 
probably solve the issue. 


On Jan 20, 2011, at 10:51 AM, Fredrik Karlsson wrote:

> Dear list,
> I need to convert this data.frame
>> names(codesM)
> [1] "key""AMR.pa1.M"  "AMR.pa2.M"  "AMR.pa3.M"  "AMR.pa4.M"
> [6] "AMR.pa5.M"  "AMR.pa6.M"  "AMR.pa7.M"  "AMR.pa8.M"  "AMR.pa9.M"
> [11] "AMR.pa10.M" "AMR.ta1.M"  "AMR.ta2.M"  "AMR.ta3.M"  "AMR.ta4.M"
> [16] "AMR.ta5.M"  "AMR.ta6.M"  "AMR.ta7.M"  "AMR.ta8.M"  "AMR.ta9.M"
> [21] "AMR.ta10.M" "AMR.ka1.M"  "AMR.ka2.M"  "AMR.ka3.M"  "AMR.ka4.M"
> [26] "AMR.ka5.M"  "AMR.ka6.M"  "AMR.ka7.M"  "AMR.ka8.M"  "AMR.ka9.M"
> [31] "AMR.ka10.M" "SMR.pa1.M"  "SMR.pa2.M"  "SMR.pa3.M"  "SMR.pa4.M"
> [36] "SMR.pa5.M"  "SMR.pa6.M"  "SMR.pa7.M"  "SMR.pa8.M"  "SMR.pa9.M"
> [41] "SMR.pa10.M" "SMR.ta1.M"  "SMR.ta2.M"  "SMR.ta3.M"  "SMR.ta4.M"
> [46] "SMR.ta5.M"  "SMR.ta6.M"  "SMR.ta7.M"  "SMR.ta8.M"  "SMR.ta9.M"
> [51] "SMR.ta10.M" "SMR.ka1.M"  "SMR.ka2.M"  "SMR.ka3.M"  "SMR.ka4.M"
> [56] "SMR.ka5.M"  "SMR.ka6.M"  "SMR.ka7.M"  "SMR.ka8.M"  "SMR.ka9.M"
> [61] "SMR.ka10.M"
>> dim(codesM)
> [1] 42 61
> into a 3 x  2501 data.frame where the "key" variable is kept, the
> values in columns 2-61 above is inserted into a "values" column and
> the name of the column is inserted in a third column ("variable"
> perhaps).
> Like
> key variable  value
> POSTOFF_1_1AMR.pa1.M   5
> POSTOFF_1_1AMR.pa2.M   3
> I think I should be able to do this using the "reshape" function, but
> I cannot get it to work. I think I need some help to understand
> this...
> (If I could split the "variable" into three separate columns splitting
> by ".", that would be even better.)
> I appreciate all the help I could get.
> /Fredrik
> -- 
> "Life is like a trumpet - if you don't put anything into it, you don't
> get anything out of it."
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large SAS dataset in R

2011-01-06 Thread Abhijit Dasgupta

I second Phil's suggestion. sas.get is actually quite nice. 

Another current option is using a command-line utility called dsread 
( to convert the sas7bdat file to a csv or tsv 
format, which can then easily be read into R using read.table and its 
derivatives. Frank Harrell (author of the Hmisc package) commented positively 
on this approach on the list a couple of months back. 


On Jan 5, 2011, at 5:51 PM, Phil Spector wrote:

> Santanu -
>   If you have sas installed on your computer, you may find using
> the sas.get function of the Hmisc package useful.
>   If the only message that read.ssd produced was "Sas failed", it
> would be difficult to figure out what went wrong.   Usually the location of 
> the log file, which would explain the error more thoroughly, is included in 
> the error message.
>   - Phil Spector
>Statistical Computing Facility
>Department of Statistics
>UC Berkeley
> On Wed, 5 Jan 2011, Santanu Pramanik wrote:
>> Hi all,
>> I have a large (approx. 1 GB) SAS dataset (test.sas7bdat) located in the
>> server (?R:/? directory). I have SAS 9.1 installed in my PC and I can read
>> the SAS dataset in SAS, under a windows environment, after assigning libname
>> in "R:\" directory.
>> Now I am trying to read the SAS dataset in R (R 2.12.0) using the read.ssd
>> function of the ?foreign? package, but I get an error message ?SAS failed?.
>> I believe I have specified the paths correctly (after reading some previous
>> posts I made sure that I do it right). Below is the small code:
>> sashome<- "C:/Program Files/SAS/SAS 9.1"
>> read.ssd(libname="R:/", sectionnames="test", sascmd=file.path(sashome,
>> "sas.exe"))
>> Please let me know where I am making the mistake. Is it because of the size
>> of the file or the location of the file (in server instead of local hard
>> drive)?
>> Thanks in advance,
>> Santanu
>> -- 
>> Santanu Pramanik
>> Survey Statistician
>> NORC at the University of Chicago
>> Bethesda, MD
>>  [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to Read a Large CSV into a Database with R

2010-11-16 Thread Abhijit Dasgupta
On 11/16/2010 12:41 PM, Seth Falcon wrote:
> Hi Abhijit,
> [I've cc'd R-help to keep the discussion on the list]
> On Tue, Nov 16, 2010 at 8:06 AM, Abhijit Dasgupta
>   wrote:
>> Seth,
>> I was looking for something like this too. I've a question. If
>> you're reading the data from a connection, does R start reading the
>> next chunk of data right after the previous chunk, or do we need to
>> keep track of things using "skip"
> The purpose of using a file connection is to allow R to keep its place
> in the file as it reads and not have to re-read or skip.  This is
> considerably more efficient.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Several lattice plots on one page

2010-11-09 Thread Abhijit Dasgupta
Another solution is using grid.arrange in the gridExtra package. This works 
like the par(mfrow=...) command, but for grid-based graphics like lattice and 

On Nov 8, 2010, at 1:19 PM, Marcus Drescher wrote:

> Dear all,
> I am trying (!!!) to generate pdfs that have 8 plots on one page:
> df = data.frame(
>   day = c(1,2,3,4),
>   var1 = c(1,2,3,4),
>   var2 = c(100,200,300,4000),
>   var3 = c(10,20,300,4),
>   var4 = c(10,2,3,4000),
>   var5 = c(10,20,30,40),
>   var6 = c(0.001,0.002,0.003,0.004),
>   var7 = c(123,223,123,412),
>   var8 = c(213,123,234,435),
>   all = as.factor(c(1,1,1,1)))
> pdf("test1.pdf", width=20, heigh=27, paper="a4") 
>print(plot(groupedData(var1 ~ day | all, data = df), main = "var1", 
> xlab="", ylab=""), split=c(1,1,2,4), more=TRUE)
>print(plot(groupedData(var2 ~ day | all, data = df), main = "var2", 
> xlab="", ylab=""), split=c(1,2,2,4), more=TRUE)
>print(plot(groupedData(var3 ~ day | all, data = df), main = "var3", 
> xlab="", ylab=""), split=c(1,3,2,4), more=TRUE)
>print(plot(groupedData(var4 ~ day | all, data = df), main = "var4", 
> xlab="", ylab=""), split=c(1,4,2,4), more=TRUE)
>print(plot(groupedData(var5 ~ day | all, data = df), main = "var5", 
> xlab="", ylab=""), split=c(2,1,2,4), more=TRUE)
>print(plot(groupedData(var6 ~ day | all, data = df), main = "var6", 
> xlab="", ylab=""), split=c(2,2,2,4), more=TRUE)
>print(plot(groupedData(var7 ~ day | all, data = df), main = "var7", 
> xlab="", ylab=""), split=c(2,3,2,4), more=TRUE)
>print(plot(groupedData(var8 ~ day | all, data = df), main = "var8", 
> xlab="", ylab=""), split=c(2,4,2,4))
> My problem is that the separate plots all have different sizes. (Some are 
> tall, but very small, or the other way around. The target is to have equally 
> tall and wide graphs. (The variables have different scales. Grouping does not 
> work.)
> Optimally, the plots would use the complete pdf page.
> Any ideas how to adjust height and width?
> Best 
> Marcus
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot output

2010-11-04 Thread Abhijit Dasgupta
The other way (in the same spirit as par(mfrow = ...) in base graphics) is to 
use the grid.arrange function in the gridExtra package. See it's documentation 
for examples.

On Nov 4, 2010, at 9:36 AM, ashz wrote:

> Dear All, 
> I have this script:
> dat <- data.frame(Month = hstat$Date,C_avg = hstat$C.avg,C_stdev =
> hstat$C.stdev)
> ggplot(data = dat, aes(x = Month, y = C_avg, ymin = C_avg - C_stdev, ymax =
> C_avg + C_stdev)) +
>  geom_point() +
>  geom_line() +
>  geom_errorbar()
> dat <- data.frame(Month = hstat$Date,K_avg = hstat$K.avg,K_stdev =
> hstat$K.stdev)
> ggplot(data = dat, aes(x = Month, y = K_avg, ymin = K_avg - K_stdev, ymax =
> K_avg + K_stdev)) +
>  geom_point() +
>  geom_line() +
>  geom_errorbar()
> dat <- data.frame(Month = hstat$Date,S_avg = hstat$S.avg,S_stdev =
> hstat$S.stdev)
> ggplot(data = dat, aes(x = Month, y = S_avg, ymin = S_avg - S_stdev, ymax =
> S_avg + S_stdev)) +
>  geom_point() +
>  geom_line() +
>  geom_errorbar()
> Running the script generates 3 separate graphs, how can I output them next
> to each other?  
> Thanks
> -- 
> View this message in context: 
> Sent from the R help mailing list archive at
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ForestPlot or similar

2010-11-02 Thread Abhijit Dasgupta
You need to use a print statement


Lattice and ggplot2 need to be explicitly printed to get output into 
jpeg. I believe Matt's function only provides the graphics object and 
not the printed version.

On 11/2/2010 4:32 PM, Mestat wrote:
> Thanks Matt,
> I am having a problem now to use this function. The function separately
> works fine. But the problem is that I am working with a simulation, so i
> placed the CREDPLOT function in my program and added the following commands
> according my data:
> rw_cibas_quantile_ori_m<-rw_quantile_app_ori[-51:-1000]
> rw_cibas_low_quantile_ori_l<-rw_cibas_low_quantile_ori[-51:-1000]
> rw_cibas_up_quantile_ori_u<-rw_cibas_up_quantile_ori[-51:-1000]
> jpeg ('Nfp_rw_bas_quantile_ori.jpeg')
> forestplot(rw_cibas_quantile_ori_m,rw_cibas_low_quantile_ori_l,rw_cibas_up_quantile_ori_u,cen=403.677)
> My program is running fine, but I am not getting any graphic. I did the
> graphic using the function FORESTPLOT, but the graphic provided by the
> function CREDPLOT is much better. Here is my code:
> rw_ciper_gini_ori_m<-rw_gini_app_ori[-51:-1000]
> rw_ciper_low_gini_ori_l<-rw_ciper_low_gini_ori[-51:-1000]
> rw_ciper_up_gini_ori_u<-rw_ciper_up_gini_ori[-51:-1000]
> tabletext<-cbind(c(rep(" ",50),NA))
> rw_ciper_gini_ori_m<-c(rw_ciper_gini_ori_m,NA)
> rw_ciper_low_gini_ori_l<-c(rw_ciper_low_gini_ori_l,NA)
> rw_ciper_up_gini_ori_u<-c(rw_ciper_up_gini_ori_u,NA)
> jpeg ('Sfp_rw_per_gini_ori.jpeg')
> forestplot(tabletext,rw_ciper_gini_ori_m,rw_ciper_low_gini_ori_l,rw_ciper_up_gini_ori_u,zero=0.4,col=meta.colors(box="royalblue",line="darkblue"))
> Any information about whats is missing/wrong in order to obtain the graphic
> with the function CREDPLOT is welcomed.
> Thanks is advance,
> Marcio

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about ggplot2

2010-11-02 Thread Abhijit Dasgupta
from where you are, 



On Nov 2, 2010, at 9:57 AM, Shige Song wrote:

> Dear All,
> I am trying to graph a simple scatter plot where the x axis is year
> and the y axis is a percentage (percentage of infant death). Instead
> of plotting the raw data, I want to plot summary statistics such as
> mean and median. Here is the problem: the value range of y is between
> 0 and 1, but since infant death is a rare event, the mean and median
> is very low (something like 5%), which shows up as a horizontal line
> at the bottom of the figure. My question is: how do I change the scale
> of the y-axis so that it does not have the range between 0 and 1 but
> between 0 and 0.1? Many thanks.
> By the way, I am using ggplot2, and here is my code:
> ---
> year.plot <- ggplot(d, aes(year, rate))
> year.plot + stat_summary(fun.y = "mean", geom = "line")
> ---
> Best,
> Shige
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] Fwd: ForestPlot or similar

2010-10-30 Thread Abhijit Dasgupta

> From: Abhijit Dasgupta 
> Date: October 31, 2010 1:30:02 AM EDT
> To: Matt Shotwell 
> Subject: Re: [R] ForestPlot or similar
> I just did something very similar using ggplot's pointrange geom. In the 
> following, I'm plotting hazard ratios, for which the nominal value is 1 and 
> not 0. x has 5 columns: drug, hr, hr.lcb, hr.ucb, and group, and I'm faceting 
> by group.  If you want the plots horizontal, add coord_flip() to the command 
> --- as it stands the plots are vertically oriented. 
>p <- ggplot(x, aes(x=drug, y = hr, ymin=hr.lcb, ymax=hr.ucb))+
>geom_pointrange()+ facet_grid(.~group)
>p <- p + xlab('Drug') + ylab('Hazard ratio')+
>geom_hline(y=1, col='red', lty=2)
> Abhijit
> On Oct 30, 2010, at 5:31 PM, Matt Shotwell wrote:
>> Here is a small function for forest plots in R, with an example:
>> -Matt
>> On Sat, 2010-10-30 at 11:40 -0400, Mestat wrote:
>>> Here is one example:
>>> I have three vectors (mean,lower interval, upper interval)
>>> mean<-c(2,4,6,8)
>>> l<-c(1,2,3,4)
>>> u<-c(4,8,12,16)
>>> How would I plot that if I want to use the FORESTPLOT function. I dont need
>>> to use the TABLETEXT option.
>>> I am working in something like this:
>>> tabletext<-c(NA,NA,NA,NA,NA)
>>> mean<-c(NA,2,4,6,8)
>>> l<-c(NA,1,2,3,4)
>>> u<-c(NA,4,8,12,16)
>>> forestplot(tabletext,mean,l,u,zero=0)
>>> But I am having a problem with the length of the dimension...
>>> Thanks in advance,
>>> Marcio
>> -- 
>> Matthew S. Shotwell
>> Graduate Student 
>> Division of Biostatistics and Epidemiology
>> Medical University of South Carolina
>> __
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Programmaticly finding number of processors by R code

2010-10-03 Thread Abhijit Dasgupta, PhD
  If you have installed multicore (for unix/mac), you can find the 
number of cores by /*multicore:::detectCores()*/

On 10/3/10 1:03 PM, Ajay Ohri wrote:
> Dear List
> Sorry if this question seems very basic.
> Is there a function to pro grammatically find number of processors in
> my system _ I want to pass this as a parameter to snow in some serial
> code to parallel code functions
> Regards
> Ajay
> Websites-
> Linkedin-
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating R objects in Java

2010-10-01 Thread Abhijit Dasgupta, PhD

 On 10/1/10 9:18 AM, lord12 wrote:

How do you call R methods from Java? I want to create a GUI using Swing in
Jaa that calls R methods in Java.

Look in the documentation for the rJava package


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Code for paper?

2010-09-30 Thread Abhijit Dasgupta, PhD
Reading Gilbert's paper and references, and going on the web, I see that 
Gilbert provided Fortran source code for his method as well as Tarone's 
method. It might be possible to wrap this in R

On 09/30/2010 06:40 PM, Jim Silverton wrote:

Does anyone has the Rcode for Gilbert's 2005 paper on the discrete FDR and
Tarone's 1990 paper? And Storey's pFDR?

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Code for paper?

2010-09-30 Thread Abhijit Dasgupta, PhD
Look at the "qvalue" package by Dabney and Storey, which might satisfy 
your last query

On 09/30/2010 06:40 PM, Jim Silverton wrote:

Does anyone has the Rcode for Gilbert's 2005 paper on the discrete FDR and
Tarone's 1990 paper? And Storey's pFDR?

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] speeding up regressions using ddply

2010-09-22 Thread Abhijit Dasgupta, PhD
 There has been a recent addition of parallel processing capabilities 
to plyr (I believe v1.2 and later), along with a dataframe iterator 
construct. Both have improved performance of ddply greatly for 
multicore/cluster computing. So we now have the niceness of plyr's 
grammar with pretty good performance. From the plyr NEWS file:

Version 1.2 (2010-09-09)


* l*ply, d*ply, a*ply and m*ply all gain a .parallel argument that when 
  applies functions in parallel using a parallel backend registered 
with the

  foreach package:

  x <- seq_len(20)
  wait <- function(i) Sys.sleep(0.1)
  system.time(llply(x, wait))
  #  user  system elapsed
  # 0.007   0.005   2.005

  system.time(llply(x, wait, .parallel = TRUE))
  #  user  system elapsed
  # 0.020   0.011   1.038

On 9/22/10 10:41 AM, Ista Zahn wrote:

Hi Alison,

On Wed, Sep 22, 2010 at 11:05 AM, Alison Macalady  wrote:


I have a data set that I'd like to run logistic regressions on, using ddply
to speed up the computation of many models with different combinations of

In my experience ddply is not particularly fast. I use it a lot
because it is flexible and has easy to understand syntax, not for it's

I would like to run regressions on every unique two-variable

combination in a portion of my data set,  but I can't quite figure out how
to do using ddply.

I'm not sure ddply is the tool for this job.

The data set looks like this, with "status" as the

binary dependent variable and V1:V8 as potential independent variables in
the logistic regression:

m<- matrix(rnorm(288), nrow = 36)
colnames(m)<- paste('V', 1:8, sep = '')
x<- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),

You can use combn to determine the combinations you want:

Varcombos<- combn(names(x)[-1], 2)

> From there you can do a loop, something like

results<- list()
for(i in 1:dim(Varcombos)[2])
   log.glm<- glm(as.formula(paste("status ~ ", Varcombos[1,i],  " + ",
Varcombos[2,i], sep="")), family=binomial(link=logit),
na.action=na.omit, data=x)
   aic<- extractAIC(log.glm)
   coef<- coef(glm.summary)
   results[[i]]<- list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2])
#or whatever other output here
   names(results)[i]<- paste(Varcombos[1,i], Varcombos[2,i], sep="_")

I'm sure you could replace the loop with something more elegant, but
I'm not really sure how to go about it.

I used melt to put my data frame into a more workable format
xm<- melt(x, id = 'status')

Here is the basic shape of the function I'd like to apply to every
combination of variables in the dataset:

h<- function(df)

log.glm<- (glm(status ~ value1+ value2 , family=binomial(link=logit),
na.action=na.omit)) #What I can't figure out is how to specify 2 different
variables (I've put value1 and value2 as placeholders) from the xm to
include in the model

aic<- extractAIC(log.glm)
coef<- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2]) #or whatever other output

And then I'd like to use ddply to speed up the computations.

output<-dddply(xm, .(variable),

I can easily do this using ddply when I only want to use 1 variable in the
model, but can't figure out how to do it with two variables.

I don't think this approach can work. You are saying "split up xm by
variable" and then expecting  to be able to reference different levels
of variable within each split, an impossible request.

Hope this helps,

Many thanks for any hints!


Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
&  Laboratory of Tree Ring Research

______ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate, by, *apply

2010-09-15 Thread Abhijit Dasgupta, PhD
 I would approach this slightly differently. I would make func a 
function of x and y.

func <- function(x,y){
m <- median(x)
return(m > 2 & m < y)

Now generate tmp just as you have. then:

res <- daply(tmp, .(z), summarise, res=func(x,y))

I believe this does the trick

On 9/15/10 5:45 PM, Mark Ebbert wrote:

Dear R gurus,

I regularly come across a situation where I would like to apply a function to a 
subset of data in a dataframe, but I have not found an R function to facilitate 
exactly what I need. More specifically, I'd like my function to have a context 
of where the data it's analyzing came from. Here is an example:

### BEGIN ###
if(m>  2&  m<  x$y){

### END ###

The values in the example are trivial, but the problem is that only one column 
is passed to my function at a time, so I can't determine how 'm' relates to 
'x$y'. Any tips/guidance is appreciated.

Mark T. W. Ebbert
__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating publication-quality plots for use in Microsoft Word

2010-09-15 Thread Abhijit Dasgupta, PhD

 On 9/15/10 10:38 AM, dadrivr wrote:

Hi everyone,

I am trying to make some publication-quality plots for use in Microsoft
Word, but I am having trouble creating high-quality plots that are supported
by Microsoft Word.

If I use the R plot function to create the figure, the lines are jagged, and
the picture is not of high quality (same with JPEG(), TIFF(), and PNG()
functions).  I have tried using the Cairo package, but it distorts my dashed
lines, and the win.metafile results in a picture of terrible quality.  The
only way I have succeeded in getting a high quality picture in a file is by
using the pdf() function to save the plot as a pdf file, but all my attempts
to convert the image in the pdf file to a TIFF or other file type accepted
by Word result in considerably degraded quality.  Do you have any
suggestions for creating publication-quality plots in R that can be placed
in Word documents?  What packages, functions (along with options), and/or
conversions would you use?  Thanks so much for your help!
Another option I've used is to export to PDF (which seems to give the 
best quality) and then use the (free) Imagemagick program to convert the 
PDF to high-resolution PNG. This worked for some involved heatmaps that 
were submitted to a journal. Imagemagick can be downloaded directly for 
Windows or via Cygwin.

Suppose your figure is in fig1.pdf. You can use the following command 
(once Imagemagick is downloaded and in your path):

system("convert -density 300x300 fig1.pdf fig1.png")


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving/loading custom R scripts

2010-09-08 Thread Abhijit Dasgupta, PhD
 You can create a .First function in your .Rprofile file (which  will 
be in ~/.Rprofile). For example

.First <- function(){

You can also create your own package ("mylibrary") down the line (see 
the R manual for creating extensions at which will be a collection 
of your custom scripts that you have written, and then you can 
automatically load them using

.First <- function(){

Hope this helps.


On 9/8/10 3:25 AM, DrCJones wrote:

How does R automatically load functions so that they are available from the
workspace? Is it anything like Matlab - you just specify a directory path
and it finds it?

The reason I ask is because  I found a really nice script that I would like
to use on a regular basis, and it would be nice not to have to 'copy and
paste' it into R on every startup:

This would be for Ubuntu, if that makes any difference.



Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Something similar to layout in lattice or ggplot

2010-09-07 Thread Abhijit Dasgupta, PhD

 Thank you all for the suggestions. They have all been immensely helpful.


On 9/7/10 10:44 AM, ONKELINX, Thierry wrote:

Dear Abhijit,

In ggplot you can use facetting (facet_grid() or facet_wrap()) to create
subplot based on the same dataset. Or you can work with viewport() if
you want several independent plots.



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie&  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen

Research Institute for Nature and Forest
team Biometrics&  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen

tel. + 32 54/436 185

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
~ John Tukey

-Oorspronkelijk bericht-
[] Namens Abhijit Dasgupta
Verzonden: dinsdag 7 september 2010 16:38
Onderwerp: [R] Something similar to layout in lattice or ggplot


Is there a function similar to the layout function in base
graphics in either lattice or ggplot? I'm hoping someone has
written a function wrapper to the appropriate commands in
grid that would make this easier :)


[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Something similar to layout in lattice or ggplot

2010-09-07 Thread Abhijit Dasgupta, PhD

 Hi Thierry,

It's really the latter I want..independent plots. I use faceting quite a 
bit, but I need things like a page of plots for simulations under 
different conditions. I suppose I can still use faceting combined with 
reshape, but I'd rather not go that route if I can help it.


On 9/7/10 10:44 AM, ONKELINX, Thierry wrote:

Dear Abhijit,

In ggplot you can use facetting (facet_grid() or facet_wrap()) to create
subplot based on the same dataset. Or you can work with viewport() if
you want several independent plots.



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie&  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen

Research Institute for Nature and Forest
team Biometrics&  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen

tel. + 32 54/436 185

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
~ John Tukey

-Oorspronkelijk bericht-
[] Namens Abhijit Dasgupta
Verzonden: dinsdag 7 september 2010 16:38
Onderwerp: [R] Something similar to layout in lattice or ggplot


Is there a function similar to the layout function in base
graphics in either lattice or ggplot? I'm hoping someone has
written a function wrapper to the appropriate commands in
grid that would make this easier :)


[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] Something similar to layout in lattice or ggplot

2010-09-07 Thread Abhijit Dasgupta

Is there a function similar to the layout function in base graphics in 
either lattice or ggplot? I'm hoping someone has written a function 
wrapper to the appropriate commands in grid that would make this easier :)


[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems in snow: can't open connection with nodes

2010-09-07 Thread Abhijit Dasgupta
This is a problem a few of us have experienced with snow, and there is a 
discussion on the R-hpc list about this. No solution yet, as far as I 
can tell.

On 9/7/2010 9:18 AM, bfoubert wrote:
> I'm working with snow and created a local cluster. So far, the same code has
> always worked (please see below). However, now I receive a message that the
> connection with the nodes cannot be opened. I restarted my workstation but
> that didn't help. Is there a known solution for this problem? Thanks a lot
> for any help.
> bram foubert
> library(snow)
> cl =
> makeSOCKcluster(c("localhost","localhost","localhost","localhost","localhost","localhost","localhost"))
> nrslaves = length(cl)
> CreateData= function(path){
> ...
> }
> clusterApply(cl,c("bgcdata1.txt","bgcdata2.txt","bgcdata3.txt","bgcdata4.txt","bgcdata5.txt","bgcdata6.txt","bgcdata7.txt"),CreateData)
> Error in checkForRemoteErrors(val) :
>7 nodes produced errors; first error: cannot open the connection

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot inside cycle

2010-08-26 Thread Abhijit Dasgupta, PhD
You haven't wrapped p in the print command, which is one of the ways to 
make sure the plot gets printed  when we need it.

 print(p+geom_point(aes(size=3))) does the trick
On 08/26/2010 06:08 AM, Petr PIKAL wrote:

Dear all

I want to save several ggplots in one pdf document. I tried this

for (i in names(iris)[2:4]) {
p<-ggplot(iris, aes(x=Sepal.Length, y=iris[,i], colour=Species))

with different variations of y input but was not successful. In past I
used qplot in similar fashion which worked

for(i in names(mleti)[7:15]) print(qplot(sito, mleti1[,i],
facets=~typ,ylab=i, geom=c("point", "line"), colour=ordered(minuty),

So I wonder if anybody used ggplot in cycle and how to solve input of
variables throughout cycle

Thank you


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove rows based on frequency of factor and then difference date scores

2010-08-24 Thread Abhijit Dasgupta, PhD
The paste-y argument is my usual trick in these situations. I forget 
that tapply can take multiple ordering arguments :)


On 08/24/2010 02:17 PM, David Winsemius wrote:

On Aug 24, 2010, at 1:59 PM, Abhijit Dasgupta, PhD wrote:

The only problem with this is that Chris's unique individuals are a 
combination of Type and ID, as I understand it. So Type=A, ID=1 is a 
different individual from Type=B,ID=1. So we need to create a unique 
identifier per person, simplistically by uniqueID=paste(Type, ID, 
sep=''). Then, using this new identifier, everything follows.

I see your point. I agree that a tapply method should present both 
factors in the indices argument.

> new.df <- txt.df[ -which( txt.df$nn <=1), ]
> new.df <- new.df[ with(new.df, order(Type, ID) ), ]  # and possibly 
needs to be ordered?
> new.df$diffdays <- unlist( tapply(new.df$dt2, list(new.df$ID, 
new.df$Type), function(x) x[1] -x) )

> new.df
  Type ID   Date Valuedt2 nn diffdays
1A  1 16/09/2020 8 2020-09-16  30
2A  1 23/09/2010 9 2010-09-23  3 3646
4B  1  13/5/2010 6 2010-05-13  30

But do not agree that you need, in this case at least, to create a 
paste()-y index. Agreed, however, such a construction can be useful in 
other situations.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove rows based on frequency of factor and then difference date scores

2010-08-24 Thread Abhijit Dasgupta, PhD
The only problem with this is that Chris's unique individuals are a 
combination of Type and ID, as I understand it. So Type=A, ID=1 is a 
different individual from Type=B,ID=1. So we need to create a unique 
identifier per person, simplistically by uniqueID=paste(Type, ID, 
sep=''). Then, using this new identifier, everything follows.

On 08/24/2010 01:53 PM, David Winsemius wrote:

On Aug 24, 2010, at 1:19 PM, Chris Beeley wrote:


A basic question which has nonetheless floored me entirely. I have a
dataset which looks like this:

Type  ID DateValue
A   116/09/2020   8
A   1 23/09/2010  9
B   3 18/8/20107
B   1 13/5/20106

There are two Types, which correspond to different individuals in
different conditions, and loads of ID labels (1:50) corresponding to
the different individuals in each condition, and measurements at
different times (from 1 to 10 measurements) for each individual.

I want to perform the following operations:

1) Delete all individuals for whom only one measurement is available.
In the dataset above, you can see that I want to delete the row Type B
ID 3, and Type B ID 1, but without deleting the Type A ID 1 data
because there is more than one measurement for Type A ID 1 (but not
for Type B ID1)

2) Produce difference scores for each of the Dates, so each individual
(Type A ID1 and all the others for whom more than one measurement
exists) starts at Date "1" and goes up in integers according to how
many days have elapsed.

I just know there's some incredibly cunning R-ish way of doing this
but after many hours of fiddling I have had to admit defeat.

Not sure about terribly cunning. Let's assume your dataframe was read 
in with stringsAsFactors=FALSE and is called txt.df:

> txt.df$dt2 <- as.Date(txt.df$Date, format="%d/%m/%Y")
> txt.df
  Type ID   Date Valuedt2
1A  1 16/09/2020 8 2020-09-16
2A  1 23/09/2010 9 2010-09-23
3B  3  18/8/2010 7 2010-08-18
4B  1  13/5/2010 6 2010-05-13

> txt.df$nn <- ave(txt.df$ID,txt.df$ID, FUN=length)
> txt.df
  Type ID   Date Valuedt2 nn
1A  1 16/09/2020 8 2020-09-16  3
2A  1 23/09/2010 9 2010-09-23  3
3B  3  18/8/2010 7 2010-08-18  1
4B  1  13/5/2010 6 2010-05-13  3
> txt.df[ -which( txt.df$nn <=1), ]
  Type ID   Date Valuedt2 nn
1A  1 16/09/2020 8 2020-09-16  3
2A  1 23/09/2010 9 2010-09-23  3
4B  1  13/5/2010 6 2010-05-13  3

# Task #1 accomplished

> tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x)
Time differences in days
[1]0 3646 3779

Time difference of 0 days

> unlist( tapply(txt.df$dt2, txt.df$ID, function(x) x[1] -x) )
  11   12   133
   0 3646 37790
> txt.df$diffdays <- unlist( tapply(txt.df$dt2, txt.df$ID, function(x) 
x[1] -x) )

> txt.df
  Type ID   Date Valuedt2 nn diffdays
1A  1 16/09/2020 8 2020-09-16  30
2A  1 23/09/2010 9 2010-09-23  3 3646
3B  3  18/8/2010 7 2010-08-18  1 3779
4B  1  13/5/2010 6 2010-05-13  30

I would be very grateful for any words of advice.

Many thanks,
Chris Beeley,
Institute of Mental Health, UK

__ mailing list
PLEASE do read the posting guide

and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__ mailing list
PLEASE do read the posting guide

and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove rows based on frequency of factor and then difference date scores

2010-08-24 Thread Abhijit Dasgupta, PhD

An answer to 1)

> x = data.frame(Type=c('A','A','B','B'), ID=c(1,1,3,1), Date = 
c('16/09/2010','23/09/2010','18/8/2010','13/5/2010'), Value=c(8,9,7,6))

> x
  Type ID   Date Value
1A  1 16/09/2010 8
2A  1 23/09/2010 9
3B  3  18/8/2010 7
4B  1  13/5/2010 6
> x$Date = as.Date(x$Date,format='%d/%m/%Y')
> library(plyr)
> x$uniqueID = paste(x$Type, x$ID, sep='')
> nobs = daply(x, ~uniqueID, nrow)
> keep = names(nobs)[nobs>1]
> newx = x[x$uniqueID %in% keep,]

An answer to 2)
> require(plyr)
> ddply(newx, ~uniqueID, transform, newDate = as.numeric(Date - 

On 08/24/2010 01:19 PM, Chris Beeley wrote:


A basic question which has nonetheless floored me entirely. I have a
dataset which looks like this:

Type  ID DateValue
A   116/09/2020   8
A   1 23/09/2010  9
B   3 18/8/20107
B   1 13/5/20106

There are two Types, which correspond to different individuals in
different conditions, and loads of ID labels (1:50) corresponding to
the different individuals in each condition, and measurements at
different times (from 1 to 10 measurements) for each individual.

I want to perform the following operations:

1) Delete all individuals for whom only one measurement is available.
In the dataset above, you can see that I want to delete the row Type B
ID 3, and Type B ID 1, but without deleting the Type A ID 1 data
because there is more than one measurement for Type A ID 1 (but not
for Type B ID1)

2) Produce difference scores for each of the Dates, so each individual
(Type A ID1 and all the others for whom more than one measurement
exists) starts at Date "1" and goes up in integers according to how
many days have elapsed.

I just know there's some incredibly cunning R-ish way of doing this
but after many hours of fiddling I have had to admit defeat.

I would be very grateful for any words of advice.

Many thanks,
Chris Beeley,
Institute of Mental Health, UK

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave

2010-08-18 Thread Abhijit Dasgupta
No, the \setkeys statement should be in the main body of the Sweave file,
not in the R code part.

On Aug 18, 2010 9:20 AM, "Randall Wrong"  wrote:

Thanks Karen and Abhijit.

I have read the section 4.1.2 of the Sweave user manual. Actually the manual
lacks example code.

I would like to change the option in "includegraphics", not to change the
true sizes of the pictures.

This is what I usually do :

print( acfplot( x ) )

Should I write :

print( acfplot( x ) )


Thanks for your help,

2010/8/18 Karen Kotschy 

> Dear Randall
> I do it like this:
> \begin{center}
>   \setkeys{Gin}{width=0.7\textwidth}
>   \begin{Scode}{fig=T, echo=F}
>  ...
>   \end{Scode}
> \end{center}
> Hope this helps.
> Karen

> On Wed 18Aug10, Randall Wrong wrote:
> > Dear R users,
> >
> > I am using Sweave.
> >
> > I woul...

> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> > --
> > This message has been scanned for viruses and
> > dangerous content by MailScanner, and is
> > believed to be clean.
> >
> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.


[[alternative HTML version deleted]]


[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave

2010-08-17 Thread Abhijit Dasgupta
Please read the Sweave documentation. The default is set to 0.8\textwidth.
You have to change a \SweaveOpt.

On Aug 17, 2010 7:00 PM, "Randall Wrong"  wrote:

Dear R users,

I am using Sweave.

I would like to use the width option for the graphics :


How do I get this ?

Thank you very much,

   [[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ESS question. How to get rid of ess-smart-underscore?

2010-08-09 Thread Abhijit Dasgupta
I bogged about 3 possible solutions recently (
Possibly the 2nd most recent post.


On Aug 9, 2010 8:28 AM, "W Eryk Wolski"  wrote:


ESS replaces "_" by "<-". How can I switch off this feature?

 I need to be able to type the underscore


Witold Eryk Wolski

Heidmark str 5
D-28329 Bremen
tel.: 04215261837

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to apply apply?!

2010-08-06 Thread Abhijit Dasgupta, PhD

For 1, an easy way is

dat <- transform(dat, CLOSE2=2*CLOSE)

For 2:


On 08/06/2010 03:06 PM, Raghuraman Ramachandran wrote:


I have say a dataframe, d and I wish to do the following:

1) For each row, I want to take one particular value of the row and multiply
it by 2. How do I do it. Say the data frame is as below:
OPEN HIGH LOW CLOSE 1931.2 1931.2 1931.2 1931.2 0 0 0 999.05 0 0 0 1052.5
0 0 0 987.8 0 0 0 925.6 0 0 0 866 0 0 0 1400.2 0 0 0 754.5 0 0 0 702.6 0 0 0
653.25 0 0 0 348 0 0 0 801 866.55 866.55 866.55 866.55 783.1 783.1 742.25
742.25 575 575 575 575 0 0 0 493 470 470 420 425 355 360 343 360 312.05
312.05 274 280.85 257.35 257.35 197 198.75 182 185.95 137 150.75 120.25 129
90.7 101.25 91.85 91.85 57 66.6

How do I multiply only the close of every row using the 'apply' function?
And once multiplied how do I obtain a new table that also contains the new
2*CLOSE column (without cbind?).

2) Also, how do I run a generic function per row. Say for example I want to
calculate the Implied Volatility for each row of this data frame ( using the
RMterics package). How do I do that please using the apply function? I am
focusing on apply because I like the vectorisation concept in R and I do not
want to use a for loop etc.

Many thanks for the enlightment,

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract se(coef) from cph?

2010-08-05 Thread Abhijit Dasgupta, PhD

if the cph model fit is m1, you can try


This is coded in (library(rms))

On 08/05/2010 04:03 PM, Biau David wrote:


I am modeling some survival data wih cph (Design). I have modeled a predictor
which showed non linear effect with restricted cubic splines. I would like to
retrieve the se(coef) for other, linear, predictors. This is just to make nice
LateX tables automatically. I have the coefficients with coef().

How do I do that?


  David Biau.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.


Abhijit Dasgupta, PhD
Director and Principal Statistician
Ph: 301.385.3067

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] snow makeCluster (makeSOCKcluster) not working in R-2.11

2010-05-12 Thread Abhijit Dasgupta

I was using snow to parallel-process some code in R-2.10 (32-bit 
windows. ). The code is as follows:


cl <- makeCluster(6, type='SOCK')
bl2 <- foreach(i=icount(length(unqmrno))) %dopar% {
(some code here)

When I run the same code in Windows R-2.11 (either 32-bit or 64-bit), R 
hangs at cl<-makeCluster(6, type='SOCK') and no R processes are spawned. 
I was wondering if others have encountered this problem, and any 
suggestions on solving this would be greatly appreciated.

Abhijit Dasgupta, PhD

Statistician | Clinical Sciences Section | NIAMS/NIH

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] Hmisc summary.formula.reverse

2010-02-18 Thread Abhijit Dasgupta


Can summary.formula.reverse be customized to allow other summary 
statistics to be reported rather than the quartiles and mean +/- sd? The 
"fun" option apparently doesn't apply when method='reverse'


Abhijit Dasgupta, PhD

Statistician | Clinical Sciences Section | NIAMS/NIH

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] tkrplot installation problems

2009-04-06 Thread Abhijit Dasgupta


I'm running R 2.8.1 on Ubuntu Hardy. I'm trying to install tkrplot. 
Using r-cran-tkrplot from the repository, I'm getting the following error:

> library(tkrplot)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Error in structure(.External("dotTcl", ..., PACKAGE = "tcltk"), class = 
"tclObj") :

 [tcl] version conflict for package "Tcl": have 8.5.0, need exactly 8.4.

Error in library(tkrplot) : .First.lib failed for 'tkrplot'

This tries to install tkrplot version 0.0.16
However, installing R2.8.1 from the repositories automatically installs 
tcl8.5. In fact, if I try and remove tcl8.5 using synaptic, it also 
removes R.

I also tried to install the package from source, both by using 
install.packages as well as downloading source (tkrplot 0.0.18). This 
fails to install.

Can someone please help. My ultimate objective is to use the 
TeachingDemos package.

> sessionInfo()
R version 2.8.1 (2008-12-22)


attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods 
[8] base



__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Output Nicely formatted tables from R

2008-07-21 Thread Abhijit Dasgupta
Please look at 
for ways to do fine control of tabular data formatting via Sweave.

Bill Cunliffe wrote:

Hi there,


I've spent a while searching for ways of outputting table data from R in
presentable formats, such as colored backgrounds for column headings, bold
fonts etc.  It appears that this is not possible, but I would be interested
to learn if in fact there was a way of achieving this.


Many thanks!

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply with a division

2008-07-03 Thread Abhijit Dasgupta
Won't scale(x,center=F, scale=x[1,]) do the trick?


Quoting Gabor Grothendieck <[EMAIL PROTECTED]>:

> This should work whether your data, x, is a data frame or a matrix:
> x / x[rep(1, nrow(x)),]
> On Thu, Jul 3, 2008 at 6:04 PM, Greg Kettler <[EMAIL PROTECTED]>
>> Hi,
>> I'd like to normalize a dataset by dividing each row by the first
>> Very simple, right?
>> I tried this:
>>> expt.fluor
>>   X1  X2  X3
>> 1 124 120 134
>> 2 165 163 174
>> 3  52  51  43
>> 4 179 171 166
>> 5 239 238 235
>>> first.row <- expt.fluor[1,]
>>> normed <- apply(expt.fluor, 1, function(r) {r / first.row})
>>> normed
>> [[1]]
>>   X1 X2 X3
>> 1  1  1  1
>> [[2]]
>>         X1       X2       X3
>> 1 1.330645 1.358333 1.298507
>> [[3]]
>>          X1    X2        X3
>> 1 0.4193548 0.425 0.3208955
>> [[4]]
>>         X1    X2       X3
>> 1 1.443548 1.425 1.238806
>> [[5]]
>>         X1       X2       X3
>> 1 1.927419 1.98 1.753731
>> Ugly! The values are right, but why didn't I get another 2D array
>> back? Shouldn't the division in my inline function return a
>> Thanks,
>> Greg
>> __
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] get formatted regression output

2008-07-02 Thread Abhijit Dasgupta
library(Hmisc) has latex and html functions to convert the output into 
latex (then to pdf, if you wish), and into html. A useful link for this 

Bunny, wrote:

Hi everybody,

I have a simple regression summary created by summary.lm  and I wonder 
how i can export it to another file format which can be used on the web.
.pdf would be possible, a html table would be nicer than your momma on 
your birthday.

any suggestions ?
thx so much in advance


__ mailing list
PLEASE do read the posting guide

and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to extract object from stats test output (cor.test)?

2008-06-19 Thread Abhijit Dasgupta
First of all, you need to store as a list not a vector if you;re storing the 
entire output. Otherwise, you can just store, for example, cor.test(...)$p.value

On Thu, 19 Jun 2008 11:01:42 -0400
"Patrick Ayscue" <[EMAIL PROTECTED]> wrote:

> Hello,
> Is there a way to extract output objects from a stats test without viewing
> the entire output?  I am trying to do so in the following:
> define a vector of length j
> for( i in 1: length (vector)) {
> vector[i] = cor.test (datavector1, datavector2[i], method=("spearman"))
> }
> I would like the reported Spearman's rho to be saved in a vector.  I have
> tried a few different ways of doing this but seem unable to figure out how
> to get only that output without looking at each report and copying by hand.
> Any help would be appreciated.
> Thanks,
> Patrick
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Abhijit Dasgupta

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem in Binning of a data set

2008-06-18 Thread Abhijit Dasgupta
One issue you will have is that, if you're using the same cutpoints for 
the binning for all three tables, you'll probably get different numbers 
of values from each column in each bin, so you won't be able to form a 
"matrix". Of course, I might be misunderstanding what you mean by 
"binning" :)


sumit gupta wrote:


I am having problem with binning the data. I have a 50X3 matrix and I binned
the data for all the 3 columns. Using table command I got the total no. of
elements in a particular bin.
Could you please tell me how to see that what all elements are there in a
particular bin and then create a different matrix for each bin?



[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] shell command

2008-06-12 Thread Abhijit Dasgupta

Yes, see ?system

samitj wrote:

Can we execute a unix shell command from within R shell?


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A curious bug in read.xls

2008-06-10 Thread Abhijit Dasgupta

I believe read.xls has a colClasses argument. If you import using
read.xls(filename, colClasses='character')
everything will be imported as a string, and you can re-convert after 

Alberto Monteiro wrote:

I found a curious bug in read.xls. I don't know if it's reproducible.

It's like this: suppose I do a read.xls in a spreadsheet. A column
begins with a number. Then, any strings below it will be rendered as NA.
If the column begins with a string, then it will be rendered correctly.

Alberto Monteiro

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Usefulness of "scale" function

2008-06-07 Thread Abhijit Dasgupta

Type ?scale in R for the answer :)

Gundala Viswanath wrote:

Hi all,

I found this snippet in a gene expression
clustering code.

temp <- readLines("GSE1110_series_matrix.txt");
cat(temp[-grep("^!|^\"$", temp)], file="GSE1110clean.txt", sep="\n");
mydata <- read.delim("GSE1110clean.txt", header=T, sep="\t")

mydatascale <- t(scale(t(mydata)))

I am wondering, in general:

1. What is the purpose of 'scale' function?
2. When one should use it?

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] favorite useful tools?

2008-06-07 Thread Abhijit Dasgupta
The two sets of packages I use a lot for their utility functions and for 
making my day-to-day analysis and reporting easier are Hmisc and Design 
by Frank Harrell and {gdata,gmodels,gplots} by Greg Warnes. Frank's 
packages have good documentation and cover a pretty good range of 
regression methods as well as a refined set of report-writing tools (see for more). These 
aren't "obscure" or "small" tools, but very useful and "cool" tools I 
use regularly.

Greg and Frank can thank me later for the free plug :-D :-D


Carl Witthoft wrote:

I'm relatively new to R, so I don't know the full list of base (or 
popular add-on packages)  functions and tools available.  For example, 
I tripped across mention of rle() in a message about some other 
problem. rle() turned out to be a handy shortcut to splitting some of 
my data by magnitude (vaguely like a sequence-based histogram).
So I thought I'd ask: what small, or obscure, tools and functions in R 
do you find handy or 'cool' to use in your work?


__ mailing list
PLEASE do read the posting guide

and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R + Linux

2008-06-06 Thread Abhijit Dasgupta
I've had R on an Ubuntu system for about 18 months now, and getting R  
up and running was a breeze. (I didn't realize it earlier, but Dirk 
certainly gets my vote of thanks for his efforts in making this process 
as easy as it is). Specially in terms of dependencies and the like, the 
Ubuntu packaging system has made things specially easy. I've also had 
the experience of installing R on a RedHat Enterprise System on a new 
server at university, and the dependencies issues was much more 
problematic (albeit, I wasn't allowed to use yum because of the way our 
IT people had set it up), specially at the compiler level. Just my 
limited experience in this area. In any case, I'm not going back to 
Windows now if not forced; I've been quite happy with my experience in 
the Linux world.


Markus Jäntti wrote:
> I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run R
> on the Debian and RedHat machines. I have encountered few problems
> running R on RedHat/CentOS, but I do think the Debian/Ubuntu package
> management system, combined with the kind provision of packages, makes
> life a lot simpler. (Yes, many thanks to Dirk!).
> Also, the ease of installing and maintaining among with the highly
> useful user forums of Ubuntu would lead me to recommend that particular
> distribution.
> Regards,
> Markus
> On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:
>> Dear all;
>> I'm planning to install Linux on my computer to run R (I'm bored of
>> W..XP). However, I haven't used Linux before and I would appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and  Intel Quad-Core Xeon processor and will be
>> used only to run R.
>> Thanks
>> Steven
>> __
>> mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlated Columns in data frame

2008-05-17 Thread Abhijit Dasgupta
The line in question randomly decides which of the two correlated 
columns to drop. If C1 and C2 are correlated you could drop either one, 
the code decides which randomly, which is a principled way to do this. 
This does mean that repeated runs of this code will give you different 
results, but the final result is what you want in all cases: the columns 
of the resultant data.frame do not have pairwise correlation above a 
threshold. I guess the point here is that starting from the same 
data.frame there are several resultant data.frames possible which 
satisfy the property you desire, so are equally valid from that 
criterion's perspective.

A simple code for your more recent query is :
ind = which(abs(cor.mat) Dear all,
> Sorry to post my query once again in the list, since I did
> not get attention from anyone in my previous mail to this
> list. 
> Now I make it simple here that please give me a code for
> find out the columns of a dataframe whose correlation
> coefficient is below a pre-determined threshold. (For
> detailed query please see my previous message to this list,
> pasted hereunder)
> Thanks and regards,
> B.Nataraj
> Following is my previous message to this list to which I do
> not get any reply.
> Dear all,
> For removing correlated columns in a data frame,df.
> I found a code written in R in the page
> of
> Mr.Rajarshi Guha. 
> The code is 
> #
> r2test <- function(df, cutoff=0.8) {
>   if (cutoff > 1 || cutoff <= 0) {
> stop(" 0 <= cutoff < 1")
>   }
>   if (!is.matrix(d) && ! {
> stop("Must supply a data.frame or matrix")
>   }
>   r2cut = sqrt(cutoff);
>   cormat <- cor(d);
>   bad.idx <- which(abs(cormat)>r2cut,arr.ind=T);
>   bad.idx <- matrix( bad.idx[bad.idx[,1] > bad.idx[,2]],
> ncol=2);
>   drop.idx <- ifelse(runif(nrow(bad.idx)) > .5,
> bad.idx[,1], bad.idx [,2]);
>   if (length(drop.idx) == 0) {
>   1:ncol(d)
>   } else {
>   (1:ncol(d))[-unique(drop.idx)]
>   }
> }
> Now the problem is the code return different output (i.e.
> different column number) for a different call. I could not
> understood why it happens from that code, but I can
> understand the logic in code except the line
> drop.idx <- ifelse(runif(nrow(bad.idx)) > .5, bad.idx[,1],
> bad.idx [,2]);
> what it means by comparing > 0.5 of nrow(bad.idx).
> So I am looking for anyone to help me for different output
> generation between the different function call as well as
>  meaning of the line which I mentioned above.
> Thanks!
> B.Nataraj
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to swap and rearrange rows?

2008-05-16 Thread Abhijit Dasgupta
Another possibility:
# Generate the months in sequential order:
m <- months(seq(as.Date('2000/1/1'), by='month',length=12),abbreviate=T)
# Concatenate the data
# generate the right order
ind <- match(substr(rownames(est),1,3),m)
# reorder the rows
est <- est[ind,]

BTW, Richard meant "rbind" instead of "cbind" in his code, which I'm sure you 


On Fri, 16 May 2008 10:09:22 +0100

> > How to swap and rearrange the row so that I will have
> > Jan-Dec in order?
> > > est31
> > p0 est.alpha est.beta  est.rate
> > Jan  0.8802867 0.7321440 7.241757 0.1380880
> > Mar  0.8598566 0.7096567 7.376367 0.1355681
> > May  0.6204301 0.8657272 6.036106 0.1656697
> > July 0.5032258 0.9928488 4.027408 0.2482986
> > Aug  0.5322581 0.9625738 4.103121 0.2437169
> > Oct  0.6792115 0.8526226 5.105218 0.1958780
> > Dec  0.8397849 0.7490287 7.070349 0.1414357
> > > est30
> > p0 est.alpha est.beta  est.rate
> > Apr  0.7296296 0.7929348 6.303877 0.1586325
> > Jun  0.5574074 0.8588608 5.695905 0.1755647
> > Sept 0.607 0.9031150 4.594891 0.2176330
> > Nov  0.7725926 0.7600906 5.636366 0.1774193
> > > est28
> >   p0 est.alpha est.beta  est.rate
> > Feb 0.877262 0.6567584 8.708051 0.1148363
> > Thank you so much.
> First, concatenate the data frames:
> est <- cbind(est31, est30, est28)
> Now you can sort the resulting data frame using order, as described in FAQ 
> on R 7.23.
> months <- factor(rownames(est), levels=c("Jan", "Feb", "Mar", "Apr", 
> "May", "Jun", "July", "Aug", "Sept", "Oct", "Nov", "Dec"))
> sortedest <- est[order(months),]
> (You might also want to recode 'July' to 'Jul' and 'Sept' to 'Sep' to be 
> consistent with the other months.)
> Regards,
> Richie.
> Mathematical Sciences Unit
> This message contains privileged and confidential info...{{dropped:12}}

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie question about vector matrix multiplication

2008-05-14 Thread Abhijit Dasgupta

Won't diag(w)%*%co%*%diag(w) do it?

Dan Stanger wrote:

Hello All,

I have a covariance matrix, generated by read.table, and cov:



X 0.0012517684 0.0002765438 0.0007887114

Y 0.0002765438 0.0002570286 0.0002117336

Z 0.0007887114 0.0002117336 0.0009168750


And a weight vector generated by 

w<- read.table("c:/r.weights")

  X Y Z

1 0.5818416 0.2158531 0.2023053


I want to compute the product of the matrix and vectors termwise to
generate a 3x3 matrix, where m[i,j]=w[i]*co[i,j]*w[j].

0.000423773 7.47216E-08 4.41255E-08

7.47216E-08 1.96566E-11 4.29229E-11

4.41255E-08 4.29229E-11 4.11045E-11


Is this possible without writing explicit loops?

Thank you,

Dan Stanger

Eaton Vance Management
200 State Street
Boston, MA 02109
617 598 8261


[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can R handle large dataset?

2008-05-14 Thread Abhijit Dasgupta
I believe this is determined by how much memory your computer has, not 
particularly by R itself.

Mingjun Huang wrote:


   I am new to R, can anyone give me an idea of how R handle a large dataset
   (e.g. couple of Gbytes)? Thanks a lot!


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] anova p value extraction

2008-05-07 Thread Abhijit Dasgupta
What works, amazingly, is
summary(pb)$"Pr(>F)" [note the quotes]


On Wed, 07 May 2008 18:17:09 -0700
"H. Paul Benton" <[EMAIL PROTECTED]> wrote:

> Yea the anova object seems to be odd. It's not S4 so that's why I tried 
> originally the attr() funtion but
> summary(pb)$Pr(>F)
> Error: unexpected '>' in "summary(pb)$Pr(>"
> > summary(pb)$Pr
> > summary(pb)@Pr(>F)
> Error: unexpected '>' in "summary(pb)@Pr(>"
> > summary(pb)@Pr(F)
> Error: no slot of name "Pr" for this object of class "summary.aov"
> In addition: Warning message:
> trying to get slot "Pr" from an object (class "summary.aov") that is
> not an S4 object
> >
> Research Programmer & Technician
> The Scripps Research Institute
> Mass Spectrometry Core Facility
>   o The
>  /
> o Scripps
>  \
>   o Research
>  /
> o Institute
> >  Hi: it's probably the Pr(> F) element so just access it by
> >
> > sum<-summary(whatever).
> >
> > then sum$Pr(>F) will probably work. But make sure that's it because 
> > usually the name is pval or pvalue etc so I'm
> > surprised about the weird name.
> >
> >
> >
> >
> > On Wed, May 7, 2008 at  8:47 PM, Paul Benton wrote:
> >
> >> hello all,
> >>
> >> Quick question, how do I get the p value out of the anova?
> >>
> >> Thanks,
> >>
> >> Paul
> >>
> >>> pb<-aov(as.numeric(diff[5,16:33]) ~ grF)
> >>> summary(pb)
> >> Df Sum SqMean Sq F value  Pr(>F)
> >> grF  3 2.7860e+10 9.2867e+09  4.2236 0.02534 *
> >> Residuals   14 3.0783e+10 2.1988e+09
> >> ---
> >> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>> str(summary(pb))
> >> List of 1
> >>  $ :Classes 'anova' and 'data.frame':   2 obs. of  5 variables:
> >>   ..$ Df : num [1:2] 3 14
> >>   ..$ Sum Sq : num [1:2] 2.79e+10 3.08e+10
> >>   ..$ Mean Sq: num [1:2] 9.29e+09 2.20e+09
> >>   ..$ F value: num [1:2] 4.22   NA
> >>   ..$ Pr(>F) : num [1:2] 0.0253 NA
> >>  - attr(*, "class")= chr [1:2] "summary.aov" "listof"
> >>> attr(summary(pb), "Pr(>F)")
> >> NULL
> >>
> >> __
> >> mailing list
> >>
> >> PLEASE do read the posting guide 
> >>
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Abhijit Dasgupta

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] CRAN and Multiple Linear Regression

2008-05-07 Thread Abhijit Dasgupta
Multiple linear regression is handled by the function lm() in the 
default installation of R. This takes inputs as lm(y~x1+x2+x3).

If you're going to be using R regularly, there are several books which 
cover the basic statistical analyses available in R (and then some), 
including those by Peter Daalgard and by Brian Ripley, who are very 
regular contributors to this list.


Anja und Th. Sponsel wrote:


I have to solve a multiple linear regression. Most programs like Excel
or Mathlab only support 5-10 dimensions. Now I have installed CRAN and
I have no clue what to do next. At the moment I am entering my data
into an excelsheet (for quick copy- paste). The Y-array will be 20
columns (=dimensions) and 128 rows (=variables). The X-array may also
be 128 rows in Excel.

Which package do I need - or does it work without any package?

What commands do I have to enter? I'm not familiar with CRAN and I
didn't find any example.

Best regards,

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] howto import .xls and .ods

2008-05-01 Thread Abhijit Dasgupta
I've written a little python script that converts ods and xls to csv, 
and a wrapper in R that imports a sheet of the ods and/or xls into a 
data.frame in R. If this is a FAQ and there is an existing solution, great.

Actually, for xls, the library gdata has a read.xls function that uses a 
perl function underneath.

Abhijit Dasgupta, Ph.D
Assistant Professor | Division of Biostatistics
Dept of Pharmacology and Experimental Therapeutics | Thomas Jefferson 

1015 Chestnut St | Suite M100 | Philadelphia PA 19107
Ph: (215) 503-9201 | Fax: (215) 503-3804
adasgupt (at) mail (dot) jci (dot) tju (dot) edu

The documents accompanying this transmission may contain confidential 
health or business information. This information is intended for the use 
of the individual or entity named above. If you have received this 
information in error, please notify the sender immediately and arrange 
for the return or destruction of these documents.

Jonas Stein wrote:


i want to import data from .ods and .xls files in R on a linux system.
Seems it was a faq in the past, but i found only solutions for Windows.

Is there a handy solution for linux? The best would be something like

mytab <-read.ods(...)

Any hints? Thanks a lot for reading so far,

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a simple question of importing data

2008-04-24 Thread Abhijit Dasgupta
Or you could use the read.xls program in the gdata library that uses a 
perl script underneath.

Charles Danko wrote:
> try:
> var <- read.table("weekly.txt", sep="\t", header=TRUE)
> Charles
> On Thu, Apr 24, 2008 at 3:29 PM, tzsmile <[EMAIL PROTECTED]> wrote:
>>  i just want to read data from Excel and i copied it and pasted into a txt
>>  file.
>>  then i want to use "read.table" to read it. but however i tried, it doesn't
>>  work.
>>  can someone help me?
>>  data is attached.
>>  thanks weekly.txt
>>  --
>>  View this message in context: 
>>  Sent from the R help mailing list archive at
>>  __
>> mailing list
>>  PLEASE do read the posting guide
>>  and provide commented, minimal, self-contained, reproducible code.
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ubuntu vs. Windows

2008-04-22 Thread Abhijit Dasgupta
My naive understanding of this (I switched to Ubuntu a year ago from 
WinXP for similar reasons) is that Ubuntu as an OS uses less memory than 
WinXP, thus leaving more memory for computation, swap space, etc. In 
other words, Ubuntu is "lighter" than XP on system resources.


Doran, Harold wrote:
> Dear List:
> I am very much a unix neophyte, but recently had a Ubuntu box installed
> in my office. I commonly use Windows XP with 3 GB RAM on my machine and
> the Ubuntu machine is exactly the same as my windows box (e.g.,
> processor and RAM) as far as I can tell.
> Now, I recently had to run a very large lmer analysis using my windows
> machine, but was unable to due to memory limitations, even after
> increasing all the memory limits in R (which I think is a 2gig max
> according to the FAQ for windows). So, to make this computationally
> feasible, I had to sample from my very big data set and then run the
> analysis. Even still, it would take something on the order of 45 mins to
> 1 hr to get parameter estimates. (BTW, SAS Proc nlmixed was even worse
> and kept giving execution errors until the data set was very small and
> then it ran for a long time)
> However, I just ran the same analysis on the Ubuntu machine with the
> full, complete data set, which is very big and lmer gave me back
> parameter estimates in less than 5 minutes. 
> Because I have so little experience with Ubuntu, I am quite pleased and
> would like to understand this a bit better. Does this occur because R is
> a bit friendlier with unix somehow? Or, is this occuring because unix
> somehow has more efficient methods for memory allocation?
> I wish I knew enough to even ask the right questions. So, I welcome any
> enlightenment members may add.
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Skipping specified rows in scan or read.table

2008-04-10 Thread Abhijit Dasgupta
Hi Ravi,

One thing I tend to do is, when using read.table, specify the option 
'colClasses='character''. This forces everything to be read as a 
character. From there, as.numeric works fine, and you don't have to deal 
with factors and reconverting them.

Hope this helps

Ravi Varadhan wrote:
> Hi,
> I have a data file, certain lines of which are character fields.  I would
> like to skip these rows, and read the data file as a numeric data frame.  I
> know that I can skip lines at the beginning with read.table and scan, but is
> there a way to skip a specified sequence of lines (e.g., 1, 2, 10, 11, 19,
> 20, 28, 29, etc.) ?  
> If I read the entire data file, and then delete the character fields, the
> values are still kept as factors, with each value denoted by its level.
> Since, I have continuous variables, there are as many levels as there are
> values.  I am unable to coerce this to "numeric" mode.  Is there a way to do
> this so that I can then manipulate the numeric data frame?
> Thanks for any help.
> Best,
> Ravi.
> ---
> Ravi Varadhan, Ph.D.
> Assistant Professor, The Center on Aging and Health
> Division of Geriatric Medicine and Gerontology 
> Johns Hopkins University
> Ph: (410) 502-2619
> Fax: (410) 614-9625
> Webpage:
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] vectorized way to combine levels of a factor

2008-04-09 Thread Abhijit Dasgupta
For your first problem, you can probably do it in 2 statements:
V3 = ifelse(V2==A,V2,V3)
V3 = ifelse(V2==B|V2==C,D,V3)

If you want to split V1 into (0,a],(a,b],(b,c],(c,1], you can do, quite simply
V1.factor = cut(V1, c(0,a,b,c,1))

On Wed, 9 Apr 2008 13:58:17 -0700
Chang Liu <[EMAIL PROTECTED]> wrote:

> Hi Gurus:
> If I have a large dataset of the form of:
> > x <- data.frame(V1 = runif(10), V2 = sample(c('A','B','C'),10,T)) > x   
> >V1 V21  0.2691580  A2  0.8711267  B3  0.2674728  C4  0.3278876  A5  
> > 0.1809152  A6  0.2499651  C7  0.9155174  A8  0.8004974  B9  0.7885516  A10 
> > 0.9301630  A
> And I want a V3 that =V2 if V2=A, and =D if V2=B or C. In other words I want 
> to use a vectorized way to combine some levels, rather than having to loop 
> through a large dataset.
> Similarly, if I want to group V1 into levels, what is a fast way to do it?
> Thank you!
> Karen
> _
> If you like crossword puzzles, then you'll love Flexicon, a game which 
> comb[[elided Hotmail spam]]
>   [[alternative HTML version deleted]]
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Abhijit Dasgupta

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave - print \n ?

2008-03-28 Thread Abhijit Dasgupta
you haven't escaped the \ for the \n, I think. Your line should be
cat("\\hline \\n"). You did escape the \ for hline, though.

Abhijit Dasgupta, Ph.D
Assistant Professor | Division of Biostatistics
Dept of Pharmacology and Experimental Therapeutics | Thomas Jefferson 
1015 Chestnut St | Suite M100 | Philadelphia PA 19107
Ph: (215) 503-9201 | Fax: (215) 503-3804
adasgupt (at) mail (dot) jci (dot) tju (dot) edu

The documents accompanying this transmission may contain confidential 
health or business information. This information is intended for the use 
of the individual or entity named above. If you have received this 
information in error, please notify the sender immediately and arrange 
for the return or destruction of these documents.

Werner Wernersen wrote:
> Hi,
> this is probably quite stupid but I have no clue
> what's wrong. Let's say I write the function
> hline <- function() {
>   cat("\\hline \n") 
> }
> and call hline() from within a Sweave chunk. Why is
> there no carriage return after the \hline in the
> resulting tex file? 
> if I call hline() hline() in the chunk, then I get
> \hline \hline 
> in the tex code without a linebreak in between.
> Thanks for any hints,
>   Werner
>   E-Mails jetzt auf Ihrem Handy.
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Recode factors

2008-03-27 Thread Abhijit Dasgupta
Another suggestion is:

blah = as.character(aa)
aa = as.factor(blah)

I've found changing factors to characters rather than numeric is 
generally safer.


Doran, Harold wrote:
> Perfect. My headache is gone. Thanks. 
>> -Original Message-
>> From: Henrique Dallazuanna [mailto:[EMAIL PROTECTED] 
>> Sent: Thursday, March 27, 2008 12:50 PM
>> To: Doran, Harold
>> Cc:
>> Subject: Re: [R] Recode factors
>> If I understand, you can try this:
>> levels(x)[] <- 0
>> On 27/03/2008, Doran, Harold <[EMAIL PROTECTED]> wrote:
>>> I know this comes up, but I didn't see my exact issue in 
>> the archives. 
>>> I  have variables in a dataframe that need to be recoded. 
>> Here is what 
>>> I'm  dealing with
>>>  I have a factor called aa
>>>  > class(aa)
>>>  [1] "factor"
>>>  > table(aa)
>>>  aa
>>> *0123ABCDLNT
>>>00 1908  725 208900   6700216
>>>  I need to recode everything that is not a numeric value 
>> into a 0. So,  
>>> for example
>>>  > mm <- ifelse(aa == 'B', 0, aa)
>>>  > table(mm)
>>>  mm
>>>0345   11   12   13
>>>   67 1908  725 2089216
>>>  The recoding works, but the values are no longer what they were  
>>> previously. For example, what was a '1' is now a '4' etc. 
>> Is there a 
>>> way  to recode factors and also keep the values the same as 
>> they were before?
>>>  That is, a '1' would remain a '1' after the recode?
>>>  After the recoding, I need to convert to a numeric 
>> variable. I can do  
>>> this as
>>>  mm <- as.numeric(as.character(aa))
>>>  Harold
>>>  > sessionInfo()
>>>  R version 2.6.2 (2008-02-08)
>>>  i386-pc-mingw32
>>>  locale:
>>>  LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>  States.1252;LC_MONETARY=English_United
>>>  States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>  attached base packages:
>>>  [1] stats graphics  grDevices utils datasets  
>> methods   base
>>>  other attached packages:
>>>  [1] gdata_2.4.0
>>>  loaded via a namespace (and not attached):
>>>  [1] gtools_2.4.0
>>>  >
>>>  __
>>> mailing list
>>>  PLEASE do read the posting guide 
>>>  and provide commented, minimal, self-contained, reproducible code.
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
> __
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.

[R] OdfWeave and contingency tables

2008-03-26 Thread Abhijit Dasgupta

I would like to use odfWeave to output some contingency tables (the 
output of "table") into OOo. I know I can do this in LaTex (using 
"latex" in the Hmisc package), but I was wondering if it is possible in 
OdfWeave. My documentation to odfTable says inputs can only be vector, 
matrix or data.frame, and I'm having a hard time converting my table 
into one of these formats.


__ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.