[R] Summary.Formula: prmsd and test statistic

2011-05-15 Thread Eli Kamara
Hello,

I'm a new user to R so apologies if this is a basic question, but after 
scouring the web on information for summary.formula, I still am searching for 
an answer.

I made a function to analyze my data - I have a categorical variable and three 
continuous variables. I am analyzing my continuous variables on the basis of my 
categorical variables.

radioanal - function(a)
{

#Educational status first - pulling variables from my database. categorical is 
13 = Edu. numerical is  48=Kyph, 50=Vert, 53=HL.
a1= a[,c(13,48,50,53)]

#make sure they are in numeric form
a2= transform(a1, Kyph=as.numeric(as.character(Kyph)), 
Vert=as.numeric(as.character(Vert)), HL=as.numeric(as.character(HL)))

#see boxplots of the individual variables
boxplot(a2$Kyph~a2$Edu, main=Education vs Kyphosis angle,
   xlab=Education, ylab=Kyphosis angle)
boxplot(a2$Vert~a2$Edu, main=Education vs # of vertebrae affected,
   xlab=Education, ylab=#of vertebrae affected)
boxplot(a2$HL~a2$Edu, main=Education vs %HL,
   xlab=Education, ylab=%HL)

#see distribution of data
d=summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, method=reverse, overall=T, 
continuous=5, add=TRUE, test=T)

#perform MANOVA
a3=manova(cbind(Kyph, Vert, HL)~as.factor(Edu), data=a2)

#return results
a4=list(Results of Educational Status MANOVA,
print(d),
summary(a3, test=Hotelling-Lawley), 
summary(a3, test=Roy) ,
summary(a3, test=Pillai),
summary(a3, test=Wilks),
summary.aov(a3)
)

print(a4)   

}

This function works as is, but I want to add the mean and standard deviation to 
my table. When I add the following code to line 36 where I print d
print(d, prmsd=TRUE)

The numbers in my table disappear. When I use the same commands from the 
command line, the same thing happens. After reading the manual, I think the 
error might be due to the missing numbers in my database, so I tried adding 
na.action to my set of commands: 

print(summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, na.action, 
method=reverse, overall=T, continuous=5, add=TRUE, test=T), prmsd=TRUE)

but then I get the following error:
Error in as.data.frame.default(data, optional = TRUE) : 
  cannot coerce class 'function' into a data.frame

Any ideas?


Also, does anyone know what kind of test statistic this function calculates? I 
compared the F and p values to a manual ANOVA but they were different.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behaviour as.data.frame

2011-05-15 Thread Jan van der Laan
Forget I asked. There was a typo in my example (stringsAsFactor  
instead of stringAsFactors) which explained the difference. My  
apologies.


My second question however still stands: How does on create a  
data.frame with given column types and given dimensions? Thanks.


Regards,
Jan


Quoting Jan van der Laan rh...@eoos.dds.nl:


I use the following code to create two data.frames d1 and d2 from a list:

types  - c(integer, character, double)
nlines - 10
d1 - as.data.frame(lapply(types, do.call, list(nlines)),
stringsAsFactor=FALSE)
l2 - lapply(types, do.call, list(nlines))
d2 - as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second
column is a factor while in d2 it is a character (which I would expect):


str(d1)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: Factor w/ 1 level : 1 1
1 1 1 1 1 1 1 1
 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: chr  ...
 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create
an 'empty' data.frame with specified column types and dimensions. I
need this data.frame to pass on to my c++ routines. Is there a more
simple/elegant way of creating this data.frame?

Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] changing the day of the week in dates format

2011-05-15 Thread Dave Evens
Dear all,

I have a question related to the POSIXlt function in R.

I have a set of dates and times, for exmaple:

startx - as.POSIXct(2011-01-01 00:00:00)
finx - as.POSIXct(2011-12-31 00:00:00)

daysx- seq(startx, finx, by=24 hours)

I
 want to change the dates of all the days falling on a Saturday to the 
next working day (i.e. Monday). So I convert dates to POSIXlt

mydaysx - as.POSIXlt(daysx)

Then I change select all the Saturday's and move them on to Monday

select - mydaysx$wday==6
mydaysx$mday[select] - mydaysx$mday[select] + 2

However,
 although all the new dates (i.e. mydaysx) are actual days of the year -
 the $wday have not been updated and the $mdays have not all been 
corrected (i.e. those falling into the next month). So if I do

select - mydaysx$wday==6

I still get the same set of days as before.

Is there a way to do this?

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with mysql and R: partitioning by quintile

2011-05-15 Thread gj
Here's how I'm trying to solve the diversity problem inherent in the data
(see below for a definition of the problem):
if (interquintile ranges have =4 ranges at the same freq) then (use
rating=3)
else
(use rating as described in jim's code)

i'll have a go and post an update. in the mean time, if you see that I'm
going straight into the ditch with my solution please do let me know.

regards
gawesh


On Sun, May 15, 2011 at 12:28 AM, gj gaw...@gmail.com wrote:

 Jim's suggestion did the trick:
 tqm - do.call(rbind, tq) + 0.001

 head(x.new) userid freq track rating
 [1,]  11 1  1
 [2,]  1   10 2  5
 [3,]  11 3  1
 [4,]  11 4  1
 [5,]  1   15 5  5
 [6,]  14 6  3


 Dennis, what you suggested didn't work.
 Thanks a lot guys! :-)
 But before I can smile, I need to resolve a problem inherent in the data.
 When the play history lacks diversity (in terms of frequency), I want to
 assign a neutral rating of 3 (for my recommender system I use rating 1 is
 'don't like' and 5 is 'i like').
 Can that be done in R?

 For example:
 input:
 userid,track,freq
 1,1,1
 1,2,1
 1,3,1
 1,4,1
 1,5,2
 1,6,2
 1,7,1
 1,8,1
 1,9,1
 1,10,1
 1,11,2
 1,12,2
 1,13,1
 1,14,1
 1,15,1
 1,16,1
 1,17,2
 1,18,2
 1,19,2
 1,20,2
 1,21,2
 1,22,1
 1,23,1
 1,24,1
 1,25,1
 1,26,1
 1,27,1
 1,28,1
 1,29,1
 1,30,1
 1,31,1
 1,32,1
 1,33,1
 1,34,1
 1,35,1

 gives output:

  head(x.new) userid freq track rating
 [1,]  11 1  1
 [2,]  11 2  1
 [3,]  11 3  1
 [4,]  11 4  1
 [5,]  12 5  4
 [6,]  12 6  4


 Ideally I want to give a neutral rating in this case :


  userid freq track rating
 [1,]  11 1  3
 [2,]  11 2  3
 [3,]  11 3  3
 [4,]  11 4  3
 [5,]  12 5  3
 [6,]  12 6  3


 Regards
 Gawesh

 On Sat, May 14, 2011 at 11:52 PM, jim holtman jholt...@gmail.com wrote:

 An easy way is to just offset the quantiles by a small increment so
 that boundary condition is less likely.  If you change the line

 tqm - do.call(rbind, tq) + 0.001

 in my example, that should do the trick.

 On Sat, May 14, 2011 at 6:09 PM, gj gaw...@gmail.com wrote:
  Hi,
  I think I haven't been able to explain correctly what I want. Here
 another
  try:
  Given that I have the following input:
  userid,track,freq
 
  1,1,1
  1,2,10
  1,3,1
  1,4,1
  1,5,15
  1,6,4
  1,7,16
  1,8,6
  1,9,1
  1,10,1
  1,11,2
  1,12,2
  1,13,1
  1,14,6
  1,15,7
  1,16,13
  1,17,3
  1,18,2
  1,19,5
  1,20,2
  1,21,2
  1,22,6
  1,23,4
  1,24,1
  1,25,1
  1,26,16
  1,27,4
  1,28,1
  1,29,4
  1,30,4
  1,31,4
  1,32,1
  1,33,14
  1,34,2
  1,35,7
 
  It is a sample of the history of tracks played: userid,track and
 frequency.
  What I want is to convert the frequency into a rating scale (1-5) based
 on
  the frequency at which a user has played a track, using the following
  interquintile ranges for the cfd:
   0%-20% = rating 1, 20%-40% = rating 2,  ,80%-100%=rating 5
 
  Jim kindly provided the following code:
  # cheers jim holtman

 x=read.csv(file=C:\\Data\\lastfm\\ratings\\play_history_3.csv,header=T,
  sep=',')
 # get the quantiles for each user(we want the frequency distribution to
 be
  based on user)
 tq - tapply(x$freq,x$userid,quantile,prob=c(0.2,0.4,0.6,0.8,1))
 # create a matrix with the rownames as the tracks to use in the
  findInterval
 tqm - do.call(rbind, tq)
 #now put the ratings
 require(data.table)
 x.dt - data.table(x)
 x.new - x.dt[,list(freq = freq,track=track,rating =
  findInterval(freq,tqm[as.character(userid[1L]),], rightmost.closed =
 TRUE) +
  1L),by=userid]
 head(x.new)
 
 
  userid freq track rating
  [1,]  11 1  2
  [2,]  1   10 2  5
  [3,]  11 3  2
  [4,]  11 4  2
  [5,]  1   15 5  5
  [6,]  14 6  4
 
 
  which is almost what I wanted except that the ratings are 1 point higher
 for
  tracks where the frequency is at the cut-off points in the interquintile
  range.
  To illustrate the quintiles are:
 
  tq$`1`
   20%  40%  60%  80% 100%
1247   16
 
 
 
  So, ideally I want (note the different ratings):
 
 
  userid freq track rating
  [1,]  11 1  1
  [2,]  1   10 2  5
  [3,]  11 3  1
  [4,]  11 4  1
  [5,]  1   15 5  5
  [6,]  14 6  3
 
 
  Can anybody help me? I'm new to R (as you have probably guessed). Sorry
 for
  the long explanation.
 
  Regards
  Gawesh
 
  On Sat, May 14, 2011 at 7:37 PM, Dennis Murphy djmu...@gmail.com
 wrote:
 
  Hi:
 
  Is this what you're after?
 
  tq - with(ds, quantile(freq, seq(0.2, 1, by = 0.2)))
  ds$int - with(ds, cut(freq, c(0, tq)))
  with(ds, table(int))
 
  int
   (0,1]  (1,2]  (2,4]  (4,7] (7,16]
 10  6  7  6  6
 
  HTH,
  

Re: [R] L'abbe plot

2011-05-15 Thread Jim Lemon

On 05/14/2011 07:20 AM, whitney.mel...@colorado.edu wrote:

I cannot seem to get a L'abbe plot to work on R. I do not understand what
the X coordinates, or alternatively an object of class metabin, is
supposed to mean. What is a class of metabin?


Hi Whitney,
The L'Abbe plot is a relatively simple illustration that shows the 
results of intervention trials as two proportions on a Cartesian plane. 
The outcomes must be dichotomous (dead/alive, cured/not cured, 
improved/not improved, etc.) and the comparisons are between two 
interventions. Say that I was asked to evaluate an intervention for 
excessive drinkers that randomly assigned the subjects to either a 
session with a behavioral therapist or a session of equal duration with 
an ex-drinker. The outcome might be whether the subject drank more or 
less over the succeeding month. Thus:


didf-data.frame(subject=1:50,interv=rep(c(therapist,ex-drinker),each=25),outcome=sample(c(more,less),50,TRUE))

didf.tab-table(didf$interv,didf$outcome)
didf.tab

 less more
  ex-drinker   14   11
  therapist12   13
chisq.test(didf.tab)

Pearson's Chi-squared test with Yates' continuity correction

data:  didf.tab
X-squared = 0.0801, df = 1, p-value = 0.7771

Apparently ex-drinkers are no better or worse than therapists. So we 
want to illustrate this with a L'Abbe plot.


library(plotrix)

labbePlot-function(x,main=L'Abbe plot,
 xlab=Positive response with placebo (%),
 ylab=Positive response with treatment (%),...) {

 plot(0,xlim=c(0,100),ylim=c(0,100),main=main,xlab=xlab,
  ylab=ylab,type=n,...)
 for(trial in 1:length(x)) {
  sum_treat-sum(x[[trial]][1,])
  sum_interv-sum(x[[trial]][2,])
  xpos-100*x[[trial]][1,1]/sum_treat
  ypos-100*x[[trial]][2,1]/sum_interv
  rad-sqrt(sum_treat+sum_interv)/2
  draw.circle(xpos,ypos,rad)
 }
 segments(0,0,100,100)
}

x-list(didf.tab)
labbePlot(x)

This shows that the therapists, whom we expected to do better, were 
slightly, but not significantly, worse than the ex-drinkers. This can't 
be right, so let's follow it up with a bigger trial.


didf2-data.frame(subject=1:200,
 interv=rep(c(therapist,ex-drinker),each=100),
 outcome=c(sample(c(more,less),100,TRUE,prob=c(0.3,0.7)),
 sample(c(more,less),100,TRUE,prob=c(0.7,0.3

didf2.tab-table(didf2$interv,didf2$outcome)
x-list(didf.tab,didf2.tab)
labbePlot(x)

That's better, isn't it? This basic plot can be tarted up with colors 
for the different circles, and other decorations so beloved of those who 
use presentation packages. Now that I've written it, I might as well add 
it to the plotrix package.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Low Pain Unicode Characters in pdf graph?

2011-05-15 Thread ivo welch
Dear R-experts---is there a relatively low-pain way to get unicode
characters into a plot to a pdf device?

pdf(file=cardsymbols.pdf)
plot( 0, xlim=c(0,5), ylim=c(0,5), type=n)
text(1,1, spades;)
text(2,2, hearts;)
text(3,3, diams;)
text(4,4, clubs;)
dev.off()

(these are the characters that I need the most NOW, but this is a more
generic question.)

sincerely, /iaw


Ivo Welch (ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find String Between Characters

2011-05-15 Thread Sparks, John James
Hi Jim,

Thanks for your note.

Unfortunately, when I attempt your solution in my exact setting, I get a
weird and slightly different answer.

First, let me be more clear.  What I am attempting to do is pull the CIK
number out of the information from the web page itself after it has loaded
to R (this may not be optimal, but I am new at this), not from the web
page reference (as you have done).

So, when I execute the following as per your suggestion:

require(scrapeR)
mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)

num - sub(^.*CIK=([0-9]+).*, \\1, mmm)

I get
[1] pointer: 0x001265c0

Is this just a hex representation of the same number, or is something else
going on here?

Comments from any and all would be much appreciated.

--John J. Sparks, Ph.D.

On Sat, May 14, 2011 7:57 pm, jim holtman wrote:
 Is this what you want:

 mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;
 num - sub(^.*CIK=([0-9]+).*, \\1, mmm)
 num
 [1] 320193



 On Sat, May 14, 2011 at 8:20 PM, Sparks, John James jspa...@uic.edu
 wrote:
 Dear R Helpers,

 I am trying to isolate a set of characters between two other characters
 in
 a long string file.  I tried some of the examples on the R help pages
 and
 elsewhere, but I am not able to get it.  Your help would be much
 appreciated.

 require(scrapeR)
 mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)
 str(mmm)

 I want to get the number 320193 that is between the CIK= and the .
  I
 have tried

 g - grep( CIK=|, mmm )
 and
 temp-grep(mmm,\CIK=\)

 and variations on these themes, but all won't run or come bask as an
 empty
 object.  How can I grab this number?

 Best wishes,
 --John J. Sparks, Ph.D.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on approximations of full logistic regression model

2011-05-15 Thread khosoda
Hi,
I am trying to construct a logistic regression model from my data (104
patients and 25 events). I build a full model consisting of five
predictors with the use of penalization by rms package (lrm, pentrace
etc) because of events per variable issue. Then, I tried to approximate
the full model by step-down technique predicting L from all of the
componet variables using ordinary least squares (ols in rms package) as
the followings. I would like to know whether I am doing right or not.

 library(rms)
 plogit - predict(full.model)
 full.ols - ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
 fastbw(full.ols, aics=1e10)

 Deleted   Chi-Sq d.f. P  Residual d.f. P  AICR2
 stenosis   1.41  10.2354   1.41   10.2354  -0.59 0.991
 x216.78  10.  18.19   20.0001  14.19 0.882
 procedure 26.12  10.  44.31   30.  38.31 0.711
 ClinicalScore 25.75  10.  70.06   40.  62.06 0.544
 x183.42  10. 153.49   50. 143.49 0.000

Then, fitted an approximation to the full model using most imprtant
variable (R^2 for predictions from the reduced model against the
original Y drops below 0.95), that is, dropping stenosis.

 full.ols.approx - ols(plogit ~ x1+x2+ClinicalScore+procedure)
 full.ols.approx$stats
  n  Model L.R.d.f.  R2   g   Sigma
104.000 487.9006640   4.000   0.9908257   1.3341718   0.1192622

This approximate model had R^2 against the full model of 0.99.
Therefore, I updated the original full logistic model dropping
stenosis as predictor.

 full.approx.lrm - update(full.model, ~ . -stenosis)

 validate(full.model, bw=F, B=1000)
  index.orig trainingtest optimism index.correctedn
Dxy   0.6425   0.7017  0.6131   0.0887  0.5539 1000
R20.3270   0.3716  0.3335   0.0382  0.2888 1000
Intercept 0.   0.  0.0821  -0.0821  0.0821 1000
Slope 1.   1.  1.0548  -0.0548  1.0548 1000
Emax  0.   0.  0.0263   0.0263  0.0263 1000

 validate(full.approx.lrm, bw=F, B=1000)
  index.orig trainingtest optimism index.correctedn
Dxy   0.6446   0.6891  0.6265   0.0626  0.5820 1000
R20.3245   0.3592  0.3428   0.0164  0.3081 1000
Intercept 0.   0.  0.1281  -0.1281  0.1281 1000
Slope 1.   1.  1.1104  -0.1104  1.1104 1000
Emax  0.   0.  0.0444   0.0444  0.0444 1000

Validatin revealed this approximation was not bad.
Then, I made a nomogram.

 full.approx.lrm.nom - nomogram(full.approx.lrm,
fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
 plot(full.approx.lrm.nom)

Another nomogram using ols model,

 full.ols.approx.nom - nomogram(full.ols.approx,
fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
 plot(full.ols.approx.nom)

These two nomograms are very similar but a little bit different.

My questions are;

1. Am I doing right?

2. Which nomogram is correct

I would appreciate your help in advance.

-- 
KH

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with makeSOCKcluster depending on R patch version

2011-05-15 Thread Uwe Ligges



On 15.05.2011 12:27, Søren Højsgaard wrote:

That raises another question: Will that patched version (2011-05-13 r55886) be 
made available as a windows binary - and if so: when?


Daily builds for WIndows of R-patched are available from CRAN.

Best,
uwe


Regards
Søren




Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] P#229; vegne 
af Uwe Ligges [lig...@statistik.tu-dortmund.de]
Sendt: 14. maj 2011 18:23
Til: Ulrich Halekoh
Cc: r-help@r-project.org
Emne: Re: [R] problem with makeSOCKcluster depending on R patch version

On 13.05.2011 14:01, Ulrich Halekoh wrote:

Dear,

I encountered a problem using the makeSOCKcluster function depending the 
patched version of
R-2.13.0 I used.


library(snow)
cl- makeSOCKcluster(rep(localhost, 2))

this works fine for the R-13.0 patch (2011-04-28 r55678)
but not for the patch R-13.0 patch (2011-05-10 r55826)



If R-2.13.0 patched is meant: I do not see this with a recent  snapshot
(2011-05-13 r55886).


Uwe Ligges





In the latter case the command  keeps running. Interrupting the command I get 
the error message

Error in socketConnection(port = port, server = TRUE, blocking = TRUE,  :
cannot open the connection
In addition: Warning message:
In socketConnection(port = port, server = TRUE, blocking = TRUE,  :
problem in listening on this socket


Does work

R version 2.13.0 Patched (2011-04-28 r55678)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
[3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
[5] LC_TIME=Danish_Denmark.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] snow_0.3-3


Does not work

R version 2.13.0 Patched (2011-05-10 r55826)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
[3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
[5] LC_TIME=Danish_Denmark.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] snow_0.3-3


Kind regards
Ulrich Halekoh

Associate Professor
   Aarhus University
Email: ulrich.hale...@agrsci.dk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unexpected behaviour as.data.frame

2011-05-15 Thread Jan van der Laan

I use the following code to create two data.frames d1 and d2 from a list:

types  - c(integer, character, double)
nlines - 10
d1 - as.data.frame(lapply(types, do.call, list(nlines)),  
stringsAsFactor=FALSE)

l2 - lapply(types, do.call, list(nlines))
d2 - as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second  
column is a factor while in d2 it is a character (which I would expect):



str(d1)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: Factor w/ 1 level : 1  
1 1 1 1 1 1 1 1 1

 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
 $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
 $ c: chr  ...
 $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create  
an 'empty' data.frame with specified column types and dimensions. I  
need this data.frame to pass on to my c++ routines. Is there a more  
simple/elegant way of creating this data.frame?


Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with makeSOCKcluster depending on R patch version

2011-05-15 Thread Søren Højsgaard
That raises another question: Will that patched version (2011-05-13 r55886) be 
made available as a windows binary - and if so: when?

Regards
Søren




Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] P#229; vegne 
af Uwe Ligges [lig...@statistik.tu-dortmund.de]
Sendt: 14. maj 2011 18:23
Til: Ulrich Halekoh
Cc: r-help@r-project.org
Emne: Re: [R] problem with makeSOCKcluster depending on R patch version

On 13.05.2011 14:01, Ulrich Halekoh wrote:
 Dear,

 I encountered a problem using the makeSOCKcluster function depending the 
 patched version of
 R-2.13.0 I used.


 library(snow)
 cl- makeSOCKcluster(rep(localhost, 2))

 this works fine for the R-13.0 patch (2011-04-28 r55678)
 but not for the patch R-13.0 patch (2011-05-10 r55826)


If R-2.13.0 patched is meant: I do not see this with a recent  snapshot
(2011-05-13 r55886).


Uwe Ligges




 In the latter case the command  keeps running. Interrupting the command I get 
 the error message

 Error in socketConnection(port = port, server = TRUE, blocking = TRUE,  :
cannot open the connection
 In addition: Warning message:
 In socketConnection(port = port, server = TRUE, blocking = TRUE,  :
problem in listening on this socket


 Does work

 R version 2.13.0 Patched (2011-04-28 r55678)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
 [5] LC_TIME=Danish_Denmark.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] snow_0.3-3


 Does not work

 R version 2.13.0 Patched (2011-05-10 r55826)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
 [5] LC_TIME=Danish_Denmark.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] snow_0.3-3


 Kind regards
 Ulrich Halekoh

 Associate Professor
   Aarhus University
 Email: ulrich.hale...@agrsci.dk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R function that returns an object's search path position XXXX

2011-05-15 Thread Dan Abner
Hello everyone,

Is there an R function that returns an object's search path position?

Thank you,

Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summary.Formula: prmsd and test statistic

2011-05-15 Thread David Winsemius


On May 14, 2011, at 11:23 AM, Eli Kamara wrote:


Hello,

I'm a new user to R so apologies if this is a basic question, but  
after scouring the web on information for summary.formula, I still  
am searching for an answer.


I made a function to analyze my data - I have a categorical variable  
and three continuous variables. I am analyzing my continuous  
variables on the basis of my categorical variables.


radioanal - function(a)
{

#Educational status first - pulling variables from my database.  
categorical is 13 = Edu. numerical is  48=Kyph, 50=Vert, 53=HL.

a1= a[,c(13,48,50,53)]

#make sure they are in numeric form
a2= transform(a1, Kyph=as.numeric(as.character(Kyph)),  
Vert=as.numeric(as.character(Vert)), HL=as.numeric(as.character(HL)))


#see boxplots of the individual variables
boxplot(a2$Kyph~a2$Edu, main=Education vs Kyphosis angle,
  xlab=Education, ylab=Kyphosis angle)
boxplot(a2$Vert~a2$Edu, main=Education vs # of vertebrae affected,
  xlab=Education, ylab=#of vertebrae affected)
boxplot(a2$HL~a2$Edu, main=Education vs %HL,
  xlab=Education, ylab=%HL)

#see distribution of data
d=summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, method=reverse,  
overall=T, continuous=5, add=TRUE, test=T)


I noticed that you were addressing the columns individually. That  
rather defeats the strategy of passing a data argument to a function  
and using only the column names in the formula. It often causes  
strange errors in model calls and I wouldn be surprised if you got  
better results with something like:


d=summary.formula( Edu~ Kyph+ HL+ Vert, data=a2, method=reverse,  
overall=T, continuous=5, add=TRUE, test=T)


--
David

#perform MANOVA
a3=manova(cbind(Kyph, Vert, HL)~as.factor(Edu), data=a2)

#return results
a4=list(Results of Educational Status MANOVA,
print(d),
summary(a3, test=Hotelling-Lawley),
summary(a3, test=Roy) ,
summary(a3, test=Pillai),
summary(a3, test=Wilks),
summary.aov(a3)
)

print(a4)   

}

This function works as is, but I want to add the mean and standard  
deviation to my table. When I add the following code to line 36  
where I print d

print(d, prmsd=TRUE)

The numbers in my table disappear. When I use the same commands from  
the command line, the same thing happens. After reading the manual,  
I think the error might be due to the missing numbers in my  
database, so I tried adding na.action to my set of commands:


print(summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, na.action,  
method=reverse, overall=T, continuous=5, add=TRUE, test=T),  
prmsd=TRUE)


but then I get the following error:
Error in as.data.frame.default(data, optional = TRUE) :
 cannot coerce class 'function' into a data.frame


It may be trying to do something with 'data' and doesn't find a 'data'  
object until it get to the 'data' function.




Any ideas?


Also, does anyone know what kind of test statistic this function  
calculates?


Huh. You do realize this function in the rms package has a help page,  
right?



I compared the F and p values to a manual ANOVA but they were  
different.




I think you break further questions down into components and post  
something that is reproducible.



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Low Pain Unicode Characters in pdf graph?

2011-05-15 Thread David Winsemius


On May 15, 2011, at 9:06 AM, ivo welch wrote:


Dear R-experts---is there a relatively low-pain way to get unicode
characters into a plot to a pdf device?

pdf(file=cardsymbols.pdf)
plot( 0, xlim=c(0,5), ylim=c(0,5), type=n)
text(1,1, spades;)
text(2,2, hearts;)
text(3,3, diams;)
text(4,4, clubs;)
dev.off()

(these are the characters that I need the most NOW, but this is a more
generic question.)


The last examples in ?points should be reviewed and tested. It is  
cited by the ?plotmath page as a way of getting at symbols.


--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] DCC-GARCH model

2011-05-15 Thread Marcin P?�ciennik
Hello,
I have a few questions concerning the DCC-GARCH model and its programming in
R.
So here is what I want to do:
I take quotes of two indices - SP500 and DJ. And the aim is to estimate
coefficients of the DCC-GARCH model for them. This is how I do it:


library(tseries)
p1 = get.hist.quote(instrument = ^gspc,start = 2005-01-07,end =
2009-09-04,compression = w, quote=AdjClose)
p2 = get.hist.quote(instrument = ^dji,start = 2005-01-07,end =
2009-09-04,compression = w, quote=AdjClose)
p = cbind(p1,p2)
y = diff(log(p))*100
y[,1] = y[,1]-mean(y[,1])
y[,2] = y[,2]-mean(y[,2])
T = length(y[,1])

library(ccgarch)
library(fGarch)

f1 = garchFit(~ garch(1,1), data=y[,1],include.mean=FALSE)
f1 = f1@fit$coef
f2 = garchFit(~ garch(1,1), data=y[,2],include.mean=FALSE)
f2 = f2@fit$coef

a = c(f1[1], f2[1])
A = diag(c(f1[2],f2[2]))
B = diag(c(f1[3], f2[3]))
dccpara = c(0.2,0.6)
dccresults = dcc.estimation(inia=a, iniA=A, iniB=B, ini.dcc=dccpara,dvar=y,
model=diagonal)

dccresults$out
DCCrho = dccresults$DCC[,2]
matplot(DCCrho, type='l')


dccresults$out deliver me the estimated coefficients of the DCC-GARCH model.
And here is my first question:
How can I check if these coefficients are significant or not? How can I test
them for significance?

second question would be:
Is this true that matplot(DCCrho, type='l') shows conditional correlation
between the two indices in question?

and the third one:
What is actually dccpara and why do I get totally different DCC-alpha and
DCC-beta coefficients if I change dccpara from c(0.2,0.6) to, let's say,
c(0.01, 0.98) ? What determines which values should be chosen?


Hopefully someone will find time to give me a hand.

Thank you very much in advance, people of good will, for looking at/checking
what I wrote and helping me.

Best regards
Marcin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Low Pain Unicode Characters in pdf graph?

2011-05-15 Thread Dennis Murphy
Hi:

For your specific problem, one way is:

plot( 0, xlim=c(0,5), ylim=c(0,5), type=n, cex = 2)
text(1, 1, expression(symbol('\252')))
text(2, 2, expression(symbol('\251')))
text(3, 3, expression(symbol('\250')))
text(4, 4, expression(symbol('\247')))

More generally, David's advice is sound; see ?plotmath and focus on
the sections 'Other symbols' and 'References'; the last reference
provides a summary table of standard symbols and their codes in
several formats.

HTH,
Dennis

On Sun, May 15, 2011 at 6:06 AM, ivo welch ivo.we...@gmail.com wrote:
 Dear R-experts---is there a relatively low-pain way to get unicode
 characters into a plot to a pdf device?

 pdf(file=cardsymbols.pdf)
 plot( 0, xlim=c(0,5), ylim=c(0,5), type=n)
 text(1,1, spades;)
 text(2,2, hearts;)
 text(3,3, diams;)
 text(4,4, clubs;)
 dev.off()

 (these are the characters that I need the most NOW, but this is a more
 generic question.)

 sincerely, /iaw

 
 Ivo Welch (ivo.we...@gmail.com)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with makeSOCKcluster depending on R patch version

2011-05-15 Thread Søren Højsgaard
I just downloaded the patched version from the Danish mirror; 
http://mirrors.dotsrc.org/cran/

That gave me: R version 2.13.0 Patched (2011-05-10 r55826) - which is *not* the 
version you refer to. 

Where may one get the latest patch then?

Regards
Søren






Fra: Uwe Ligges [lig...@statistik.tu-dortmund.de]
Sendt: 15. maj 2011 15:25
Til: Søren Højsgaard
Cc: Ulrich Halekoh; r-help@r-project.org
Emne: Re: SV: [R] problem with makeSOCKcluster depending on R patch version

On 15.05.2011 12:27, Søren Højsgaard wrote:
 That raises another question: Will that patched version (2011-05-13 r55886) 
 be made available as a windows binary - and if so: when?

Daily builds for WIndows of R-patched are available from CRAN.

Best,
uwe

 Regards
 Søren



 
 Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] P#229; 
 vegne af Uwe Ligges [lig...@statistik.tu-dortmund.de]
 Sendt: 14. maj 2011 18:23
 Til: Ulrich Halekoh
 Cc: r-help@r-project.org
 Emne: Re: [R] problem with makeSOCKcluster depending on R patch version

 On 13.05.2011 14:01, Ulrich Halekoh wrote:
 Dear,

 I encountered a problem using the makeSOCKcluster function depending the 
 patched version of
 R-2.13.0 I used.


 library(snow)
 cl- makeSOCKcluster(rep(localhost, 2))

 this works fine for the R-13.0 patch (2011-04-28 r55678)
 but not for the patch R-13.0 patch (2011-05-10 r55826)


 If R-2.13.0 patched is meant: I do not see this with a recent  snapshot
 (2011-05-13 r55886).


 Uwe Ligges




 In the latter case the command  keeps running. Interrupting the command I 
 get the error message

 Error in socketConnection(port = port, server = TRUE, blocking = TRUE,  :
 cannot open the connection
 In addition: Warning message:
 In socketConnection(port = port, server = TRUE, blocking = TRUE,  :
 problem in listening on this socket


 Does work

 R version 2.13.0 Patched (2011-04-28 r55678)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
 [5] LC_TIME=Danish_Denmark.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] snow_0.3-3


 Does not work

 R version 2.13.0 Patched (2011-05-10 r55826)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=Danish_Denmark.1252  LC_CTYPE=Danish_Denmark.1252
 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C
 [5] LC_TIME=Danish_Denmark.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] snow_0.3-3


 Kind regards
 Ulrich Halekoh

 Associate Professor
Aarhus University
 Email: ulrich.hale...@agrsci.dk

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing the day of the week in dates format

2011-05-15 Thread Adrian Duffner

Hi Dave,

your problem is that you are working with a S3 class, what is mainly a 
list with naming convention. Hence it is possible to change just one 
entry of the list, but it is nearly never recommendable.


So a slight change to your code should provide you the required output:
 mydaysx[select] - mydaysx[select] + 2*24*60*60
 select - mydaysx$wday==6
 sum(select)
[1] 0

In this case not only the entry $mday of the list is changed, but the 
whole object is updated.


Cheers
Adrian

Am 14.05.2011 20:44, schrieb Dave Evens:

Dear all,

I have a question related to the POSIXlt function in R.

I have a set of dates and times, for exmaple:

startx- as.POSIXct(2011-01-01 00:00:00)
finx- as.POSIXct(2011-12-31 00:00:00)

daysx- seq(startx, finx, by=24 hours)

I
  want to change the dates of all the days falling on a Saturday to the
next working day (i.e. Monday). So I convert dates to POSIXlt

mydaysx- as.POSIXlt(daysx)

Then I change select all the Saturday's and move them on to Monday

select- mydaysx$wday==6
mydaysx$mday[select]- mydaysx$mday[select] + 2

However,
  although all the new dates (i.e. mydaysx) are actual days of the year -
  the $wday have not been updated and the $mdays have not all been
corrected (i.e. those falling into the next month). So if I do

select- mydaysx$wday==6

I still get the same set of days as before.

Is there a way to do this?

Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] hotelling and confidence region

2011-05-15 Thread jm_jem
Good morning 

I've made an PCA and I'd like to plot a confidence region based on Hotelling
T2? Does anyone know how to compute it? 

Thank you

--
View this message in context: 
http://r.789695.n4.nabble.com/hotelling-and-confidence-region-tp3524204p3524204.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] integrate

2011-05-15 Thread meltem gölgeli
Dear R-users,
I'am really new at R. That's why I probably have a basic quastion. I have a
function like f(x,y)=\int^{0}_{y}(2*x)*exp(y-t)dt or
f(x,y)=\int^{0}_{y}((2*x)*exp(\int^{0}_{t}(x*k)dk)dt and I can also define
some basic loops for xy like x in 1:3 and y in 1:2. Could anybody please
help me?
best wishes,
mgm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behaviour as.data.frame

2011-05-15 Thread Bert Gunter
In your post, you're missing the final s on the stringsAsFactors
argument in the d1 assignment. When I typed it correctly, it works as
expected.

-- Bert

On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan rh...@eoos.dds.nl wrote:
 I use the following code to create two data.frames d1 and d2 from a list:
 types  - c(integer, character, double)
 nlines - 10
 d1     - as.data.frame(lapply(types, do.call, list(nlines)),
 stringsAsFactor=FALSE)
 l2     - lapply(types, do.call, list(nlines))
 d2     - as.data.frame(l2, stringsAsFactors=FALSE)

 I would expect d1 and d2 to be the same, however, in d1 the second column is
 a factor while in d2 it is a character (which I would expect):

 str(d1)

 'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: Factor w/ 1 level : 1 1 1 1
 1 1 1 1 1 1
  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0

 str(d2)

 'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: chr  ...
  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0


 As different but related question: I use the commands above to create an
 'empty' data.frame with specified column types and dimensions. I need this
 data.frame to pass on to my c++ routines. Is there a more simple/elegant way
 of creating this data.frame?

 Regards,

 Jan


 PS:
 I am running R on 64 bit Ubuntu 11.04:

 sessionInfo()

 R version 2.12.1 (2010-12-16)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-15 Thread Gabor Grothendieck
On Fri, May 13, 2011 at 6:38 AM, Michael Haenlein
haenl...@escpeurope.eu wrote:
 I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
 i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
 calculations run for several days sometimes even weeks (mainly simulations
 over a large parameter space). Depending on the external conditions, my
 laptop sometimes shuts down due to overheating.

If you are on Windows press the Windows key and type in Power Options.
 When the associated dialog pops up choose Power Saver.  Now your PC
will use less power so it won't heat up so much although your
performance could suffer a bit.

Also ensure that there is sufficient air circulation around the machine.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-15 Thread Spencer Graves
Also:  A previous post in this tread suggested Rprof [sec. 3.2 in 
Writing R Extensions, available via help.start()]. This should 
identify the functions that consume the most time.  The standard 
procedure to improve speed is as follows:



  1.  Experiment with different ways of computing the same thing in 
R.  In many cases, this can help you reduce the compute time by a factor 
of 10 or even 1,000 or more.  Try this, perhaps using proc.time and 
system.time with portions of your code, the rerun Rprof.



  2.  After you feel you have done the best you can with R, you 
might try coding the most compute intensive portion of the algorithm in 
a compiled language like C, C++ or Fortran.  Then rerun Rprof, etc.



  3.  After trying (or not) compiled code, it may be appropriate to 
consider CRAN Task View: High-Performance and Parallel Computing with 
R.  (From a CRAN mirror, select Task Views - 
HighPerformanceComputing:  High-Performance and Parallel Computing with 
R.)  You may also want to try the foreach package from Revolution 
Computing (revolutionanalytics.com).  These capabilities can help you 
get the most out of a multi-core computer.  NOTE:  While your code is 
running, you can check the Performance tab in Windows Task Manager to 
see what percent of your CPUs and physical memory you are using.  I 
mention this, because without foreach you might get at most 1 of your 
4 CPUs running R.  With foreach, you might be able to get all of them 
working for you.  Then after you have done this and satisfied yourself 
that you've done the best you can with all of this, I suggest you try 
the Amazon Cloud.



  If you have not already solved your problem with this and have 
not yet tried these three steps, I suggest you try this.  It may take 
more of your time, but you will likely learn much that will help you in 
the future as well as help you make a better choice of a new computer if 
you ultimately decide to do that.



  Hope this helps.
  Spencer


On 5/15/2011 8:28 AM, Gabor Grothendieck wrote:

On Fri, May 13, 2011 at 6:38 AM, Michael Haenlein
haenl...@escpeurope.eu  wrote:

I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
calculations run for several days sometimes even weeks (mainly simulations
over a large parameter space). Depending on the external conditions, my
laptop sometimes shuts down due to overheating.

If you are on Windows press the Windows key and type in Power Options.
  When the associated dialog pops up choose Power Saver.  Now your PC
will use less power so it won't heat up so much although your
performance could suffer a bit.

Also ensure that there is sufficient air circulation around the machine.



--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Again on Data Mining

2011-05-15 Thread Lorenzo Isella

Dear All,
I have already posted before on the list about data mining and it has 
proved very useful.
I have now a training dataset consisting of N objects of MN different 
kinds (actually, M is usually 3 to 5, whereas N is of the order of 1000).

Every object has its own label L_i, i=1...N, that is known.
For each of these objects I measure some property in time (let's say I 
measure it Q times in a given time interval), i.e. the i-th object has 
an associated file {t, y}, where t=(t_1,t_2t_Q) and y=(y_1,y_2,...y_Q).
My problem is then to come up with an algorithm that after learning on 
the training dataset, can guess the labels of a testing dataset.
The difference with respect to the datamining I have done so far is that 
I do not have a set of properties for every object (e.g. age, sex, 
income, etc...) but rather an associated function y=f(t).
Any suggestion (either conceptual or about which R package I should turn 
to) is greatly appreciated.

Many thanks

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing the day of the week in dates format

2011-05-15 Thread Adrian Duffner
Dear Dave,

please always answer to the whole list.

To answer your question: A quick check showed that your proposed code 
will not work as you expected it
  tt - as.POSIXlt(strptime(2011-01-01 00:00:00, %Y-%m-%d %H:%M:%S, 
tz=GMT))
  tt
[1] 2011-01-01 GMT
  tt+ 365.25*24*60*60
[1] 2012-01-01 06:00:00 GMT
  tt+ 365.25*24*60*60*2
[1] 2012-12-31 12:00:00 GMT

I am not aware of a addtime function in base package (similar to 
difftime()), maybe there is one in one of the packages on cran.
A quick google search provided the following link for adding years to a 
date http://tolstoy.newcastle.edu.au/R/help/05/10/13700.html, where 
seq.Date() was proposed.

Regards
Adrian

Am 15.05.2011 16:25, schrieb Dave Evens:
 Hi Adrian,

 Many thanks for your reply.

 Suppose I wanted to increment the date by a year - how would I account 
 for things like leap years?

 Would I just do
  mydaysx[select] - mydaysx[select] + 365.25*24*60*60

 Regards,
 Dave
 
 *From:* Adrian Duffner duffn...@googlemail.com
 *To:* Dave Evens daveeve...@yahoo.co.uk
 *Cc:* r-help@r-project.org r-help@r-project.org
 *Sent:* Sunday, 15 May 2011, 14:21
 *Subject:* Re: [R] changing the day of the week in dates format

 Hi Dave,

 your problem is that you are working with a S3 class, what is mainly a
 list with naming convention. Hence it is possible to change just one
 entry of the list, but it is nearly never recommendable.

 So a slight change to your code should provide you the required output:
  mydaysx[select] - mydaysx[select] + 2*24*60*60
  select - mydaysx$wday==6
  sum(select)
 [1] 0

 In this case not only the entry $mday of the list is changed, but the
 whole object is updated.

 Cheers
 Adrian

 Am 14.05.2011 20:44, schrieb Dave Evens:
  Dear all,
 
  I have a question related to the POSIXlt function in R.
 
  I have a set of dates and times, for exmaple:
 
  startx- as.POSIXct(2011-01-01 00:00:00)
  finx- as.POSIXct(2011-12-31 00:00:00)
 
  daysx- seq(startx, finx, by=24 hours)
 
  I
   want to change the dates of all the days falling on a Saturday to the
  next working day (i.e. Monday). So I convert dates to POSIXlt
 
  mydaysx- as.POSIXlt(daysx)
 
  Then I change select all the Saturday's and move them on to Monday
 
  select- mydaysx$wday==6
  mydaysx$mday[select]- mydaysx$mday[select] + 2
 
  However,
   although all the new dates (i.e. mydaysx) are actual days of the year -
   the $wday have not been updated and the $mdays have not all been
  corrected (i.e. those falling into the next month). So if I do
 
  select- mydaysx$wday==6
 
  I still get the same set of days as before.
 
  Is there a way to do this?
 
  Thanks,
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R function that returns an object's search path position XXXX

2011-05-15 Thread Duncan Murdoch

On 14/05/2011 9:41 PM, Dan Abner wrote:

Hello everyone,

Is there an R function that returns an object's search path position?


Does find() do what you want?  It doesn't give the position in the 
search path, but you could get that from something like


name - plot

which( search() %in% find(name) )

Be aware that a name can appear in more than one place in the path, and 
not all available objects are on the search path.  So I'm not sure the 
above solves your real problem.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behaviour as.data.frame

2011-05-15 Thread Jan van der Laan
Thanks. I also noticed myself minutes after sending my message to the 
list. My 'please ignore my question it was just a stupid typo' message 
was sent with the wrong account and is now awaiting moderation.


However, my other question still stands: what is the 
preferred/fastest/simplest way to create a data.fame with given column 
types and dimensions?


Regards,
Jan


On 05/15/2011 04:43 PM, Bert Gunter wrote:

In your post, you're missing the final s on the stringsAsFactors
argument in the d1 assignment. When I typed it correctly, it works as
expected.

-- Bert

On Sun, May 15, 2011 at 4:25 AM, Jan van der Laanrh...@eoos.dds.nl  wrote:

I use the following code to create two data.frames d1 and d2 from a list:
types- c(integer, character, double)
nlines- 10
d1- as.data.frame(lapply(types, do.call, list(nlines)),
stringsAsFactor=FALSE)
l2- lapply(types, do.call, list(nlines))
d2- as.data.frame(l2, stringsAsFactors=FALSE)

I would expect d1 and d2 to be the same, however, in d1 the second column is
a factor while in d2 it is a character (which I would expect):


str(d1)

'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: Factor w/ 1 level : 1 1 1 1
1 1 1 1 1 1
  $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0

str(d2)

'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: chr  ...
  $ c.0..0..0..0..0..0..0..0..0..0.  : num  0 0 0 0 0 0 0 0 0 0


As different but related question: I use the commands above to create an
'empty' data.frame with specified column types and dimensions. I need this
data.frame to pass on to my c++ routines. Is there a more simple/elegant way
of creating this data.frame?

Regards,

Jan


PS:
I am running R on 64 bit Ubuntu 11.04:


sessionInfo()

R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-15 Thread Henrik Bengtsson
On Sun, May 15, 2011 at 9:31 AM, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:
 Also:  A previous post in this tread suggested Rprof [sec. 3.2 in Writing
 R Extensions, available via help.start()]. This should identify the
 functions that consume the most time.  The standard procedure to improve
 speed is as follows:


      1.  Experiment with different ways of computing the same thing in R.
  In many cases, this can help you reduce the compute time by a factor of 10
 or even 1,000 or more.  Try this, perhaps using proc.time and system.time
 with portions of your code, the rerun Rprof.

I second this one; if you have things running for weeks, and you
haven't done any serious optimization already, you most likely can
bring that down to days or hours by investigating where the
bottlenecks are.  Here is a good illustration how a simple piece of R
code is made 12,000 times faster:

  http://rwiki.sciviews.org/doku.php?id=tips:programming:code_optim2



      2.  After you feel you have done the best you can with R, you might try
 coding the most compute intensive portion of the algorithm in a compiled
 language like C, C++ or Fortran.  Then rerun Rprof, etc.


      3.  After trying (or not) compiled code, it may be appropriate to
 consider CRAN Task View: High-Performance and Parallel Computing with R.
  (From a CRAN mirror, select Task Views - HighPerformanceComputing:
  High-Performance and Parallel Computing with R.)  You may also want to try
 the foreach package from Revolution Computing (revolutionanalytics.com).
  These capabilities can help you get the most out of a multi-core computer.
  NOTE:  While your code is running, you can check the Performance tab in
 Windows Task Manager to see what percent of your CPUs and physical memory
 you are using.  I mention this, because without foreach you might get at
 most 1 of your 4 CPUs running R.  With foreach, you might be able to get
 all of them working for you.  Then after you have done this and satisfied
 yourself that you've done the best you can with all of this, I suggest you
 try the Amazon Cloud.


      If you have not already solved your problem with this and have not yet
 tried these three steps, I suggest you try this.  It may take more of your
 time, but you will likely learn much that will help you in the future as
 well as help you make a better choice of a new computer if you ultimately
 decide to do that.


      Hope this helps.
      Spencer


 On 5/15/2011 8:28 AM, Gabor Grothendieck wrote:

 On Fri, May 13, 2011 at 6:38 AM, Michael Haenlein
 haenl...@escpeurope.eu  wrote:

 I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel
 Core
 i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
 calculations run for several days sometimes even weeks (mainly
 simulations
 over a large parameter space). Depending on the external conditions, my
 laptop sometimes shuts down due to overheating.

 If you are on Windows press the Windows key and type in Power Options.
  When the associated dialog pops up choose Power Saver.  Now your PC
 will use less power so it won't heat up so much although your
 performance could suffer a bit.

 Also ensure that there is sufficient air circulation around the machine.

To move this hardware-specific discussion off the R-help list, I
strongly recommend the 'Thinkpad.com Support Community' (open
community/non-Lenovo) with lots of experts and users:

  http://forum.thinkpads.com/

I've seen discussions on overheating/emergency shutdowns there.

/Henrik



 --
 Spencer Graves, PE, PhD
 President and Chief Operating Officer
 Structure Inspection and Monitoring, Inc.
 751 Emerson Ct.
 San José, CA 95126
 ph:  408-655-4567

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] integrate

2011-05-15 Thread David Winsemius


On May 15, 2011, at 6:51 AM, meltem gölgeli wrote:


Dear R-users,
I'am really new at R. That's why I probably have a basic quastion. I  
have a

function like f(x,y)=\int^{0}_{y}(2*x)*exp(y-t)dt or
f(x,y)=\int^{0}_{y}((2*x)*exp(\int^{0}_{t}(x*k)dk)dt and I can also  
define
some basic loops for xy like x in 1:3 and y in 1:2. Could anybody  
please

help me?


You should take one of the following paths:

--- Stumble around using the help pages starting with ?Control

--- Buy an introductory text
 http://www.r-project.org/doc/bib/R-books.html
--- Read the Introduction to R
 http://cran.r-project.org/doc/manuals/R-intro.pdf

You should also ---

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html




--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find String Between Characters

2011-05-15 Thread William Dunlap
It looks like you can get the text of the document
with
  as(mmm[[1]], character)
and you can use grep, strsplit, gsub, etc. on that text.

Look at the functions in the XML pacakge for ways
to use the XML structure of the data instead of pattern
matching to extract meaningful parts of the document.

class?HTMLInternalDocument

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Sparks, John James
 Sent: Saturday, May 14, 2011 7:14 PM
 To: jim holtman
 Cc: r-help@r-project.org
 Subject: Re: [R] Find String Between Characters
 
 Hi Jim,
 
 Thanks for your note.
 
 Unfortunately, when I attempt your solution in my exact 
 setting, I get a
 weird and slightly different answer.
 
 First, let me be more clear.  What I am attempting to do is 
 pull the CIK
 number out of the information from the web page itself after 
 it has loaded
 to R (this may not be optimal, but I am new at this), not from the web
 page reference (as you have done).
 
 So, when I execute the following as per your suggestion:
 
 require(scrapeR)
 mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?actio
n=getcompanyCIK=320193owner=excludecount=40)
 
 num - sub(^.*CIK=([0-9]+).*, \\1, mmm)
 
 I get
 [1] pointer: 0x001265c0
 
 Is this just a hex representation of the same number, or is 
 something else
 going on here?
 
 Comments from any and all would be much appreciated.
 
 --John J. Sparks, Ph.D.
 
 On Sat, May 14, 2011 7:57 pm, jim holtman wrote:
  Is this what you want:
 
  
 mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompan
yCIK=320193owner=excludecount=40
  num - sub(^.*CIK=([0-9]+).*, \\1, mmm)
  num
  [1] 320193
 
 
 
  On Sat, May 14, 2011 at 8:20 PM, Sparks, John James 
 jspa...@uic.edu
  wrote:
  Dear R Helpers,
 
  I am trying to isolate a set of characters between two 
 other characters
  in
  a long string file.  I tried some of the examples on the R 
 help pages
  and
  elsewhere, but I am not able to get it.  Your help would be much
  appreciated.
 
  require(scrapeR)
  
 mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?actio
n=getcompanyCIK=320193owner=excludecount=40)
  str(mmm)
 
  I want to get the number 320193 that is between the 
 CIK= and the .
   I
  have tried
 
  g - grep( CIK=|, mmm )
  and
  temp-grep(mmm,\CIK=\)
 
  and variations on these themes, but all won't run or come 
 bask as an
  empty
  object.  How can I grab this number?
 
  Best wishes,
  --John J. Sparks, Ph.D.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Jim Holtman
  Data Munger Guru
 
  What is the problem that you are trying to solve?
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summary.Formula: prmsd and test statistic

2011-05-15 Thread Eli Kamara
I tried the modification but no luck. Here is exactly what I'm seeing.
The command works fine, but when I add prmsd=TRUE the numbers
disappear.

 print(summary.formula(S~Kyph+Vert, data=radio, method=reverse, overall=T, 
 continuous=5, add=TRUE, test=T))


Descriptive Statistics by S

++---+--++++-+
||N  |Guru Teg Bahadur Hospital |St. Stephens Hospital   |VIMHANS
Hospital|Combined|  Test   |
||   |(N=37)|(N=1)   |(N=75)
   |(N=113) |Statistic|
++---+--++++-+
|Kyph|109| 13.6625/20.7100/29.1325  | 11.6400/11.6400/11.6400|
0./ 9.6200/17.1650|  2.1600/12.0100/21.1900|F=9.23 d.f.=2,106
P0.001|
++---+--++++-+
|Vert|113|  2/3/3   |  2/2/2 |
 2/2/3 |  2/2/3 |F=2.65 d.f.=2,110 P=0.075|
++---+--++++-+
 print(summary.formula(S~Kyph+Vert, data=radio, method=reverse, overall=T, 
 continuous=5, add=TRUE, test=T), prmsd=TRUE)


Descriptive Statistics by S

++---+--+--+-+-+-+
||N  |Guru Teg Bahadur Hospital |St. Stephens Hospital |VIMHANS
Hospital |Combined |  Test   |
||   |(N=37)|(N=1) |(N=75)
  |(N=113)  |Statistic|
++---+--+--+-+-+-+
|Kyph|109|  |  |
  | |F=9.23 d.f.=2,106 P0.001|
++---+--+--+-+-+-+
|Vert|113|  |  |
  | |F=2.65 d.f.=2,110 P=0.075|
++---+--+--+-+-+-+


On Sun, May 15, 2011 at 10:03 AM, David Winsemius
dwinsem...@comcast.net wrote:

 On May 14, 2011, at 11:23 AM, Eli Kamara wrote:

 Hello,

 I'm a new user to R so apologies if this is a basic question, but after
 scouring the web on information for summary.formula, I still am searching
 for an answer.

 I made a function to analyze my data - I have a categorical variable and
 three continuous variables. I am analyzing my continuous variables on the
 basis of my categorical variables.

 radioanal - function(a)
 {

 #Educational status first - pulling variables from my database.
 categorical is 13 = Edu. numerical is  48=Kyph, 50=Vert, 53=HL.
 a1= a[,c(13,48,50,53)]

 #make sure they are in numeric form
 a2= transform(a1, Kyph=as.numeric(as.character(Kyph)),
 Vert=as.numeric(as.character(Vert)), HL=as.numeric(as.character(HL)))

 #see boxplots of the individual variables
 boxplot(a2$Kyph~a2$Edu, main=Education vs Kyphosis angle,
  xlab=Education, ylab=Kyphosis angle)
 boxplot(a2$Vert~a2$Edu, main=Education vs # of vertebrae affected,
  xlab=Education, ylab=#of vertebrae affected)
 boxplot(a2$HL~a2$Edu, main=Education vs %HL,
  xlab=Education, ylab=%HL)

 #see distribution of data
 d=summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, method=reverse,
 overall=T, continuous=5, add=TRUE, test=T)

 I noticed that you were addressing the columns individually. That rather
 defeats the strategy of passing a data argument to a function and using only
 the column names in the formula. It often causes strange errors in model
 calls and I wouldn be surprised if you got better results with something
 like:

 d=summary.formula( Edu~ Kyph+ HL+ Vert, data=a2, method=reverse,
 overall=T, continuous=5, add=TRUE, test=T)

 --
 David

 #perform MANOVA
 a3=manova(cbind(Kyph, Vert, HL)~as.factor(Edu), data=a2)

 #return results
 a4=list(Results of Educational Status MANOVA,
 print(d),
 summary(a3, test=Hotelling-Lawley),
 summary(a3, test=Roy) ,
 summary(a3, test=Pillai),
 summary(a3, test=Wilks),
 summary.aov(a3)
 )

 print(a4)

 }

 This function works as is, but I want to add the mean and standard
 deviation to my table. When I add the following code to line 36 where I
 print d
 print(d, prmsd=TRUE)

 The numbers in my table disappear. When I use the same commands from the
 command line, the same thing happens. After reading the manual, I think the
 error might be due to the missing numbers in my database, so I tried adding
 na.action to my set of commands:

 print(summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, 

Re: [R] changing the day of the week in dates format

2011-05-15 Thread Dave Evens
Hi Adrian,

Many thanks for your reply.

Suppose I wanted to increment the date by a year - how would I account for 
things like leap years?

Would I just do 
 mydaysx[select] - mydaysx[select] + 365.25*24*60*60

Regards,Dave



From: Adrian Duffner duffn...@googlemail.com

Cc: r-help@r-project.org r-help@r-project.org
Sent: Sunday, 15 May 2011, 14:21
Subject: Re: [R] changing the day of the week in dates format

Hi Dave,

your problem is that you are working with a S3 class, what is mainly a 
list with naming convention. Hence it is possible to change just one 
entry of the list, but it is nearly never recommendable.

So a slight change to your code should provide you the required output:
 mydaysx[select] - mydaysx[select] + 2*24*60*60
 select - mydaysx$wday==6
 sum(select)
[1] 0

In this case not only the entry $mday of the list is changed, but the 
whole object is updated.

Cheers
Adrian

Am 14.05.2011 20:44, schrieb Dave Evens:
 Dear all,

 I have a question related to the POSIXlt function in R.

 I have a set of dates and times, for exmaple:

 startx- as.POSIXct(2011-01-01 00:00:00)
 finx- as.POSIXct(2011-12-31 00:00:00)

 daysx- seq(startx, finx, by=24 hours)

 I
   want to change the dates of all the days falling on a Saturday to the
 next working day (i.e. Monday). So I convert dates to POSIXlt

 mydaysx- as.POSIXlt(daysx)

 Then I change select all the Saturday's and move them on to Monday

 select- mydaysx$wday==6
 mydaysx$mday[select]- mydaysx$mday[select] + 2

 However,
   although all the new dates (i.e. mydaysx) are actual days of the year -
   the $wday have not been updated and the $mdays have not all been
 corrected (i.e. those falling into the next month). So if I do

 select- mydaysx$wday==6

 I still get the same set of days as before.

 Is there a way to do this?

 Thanks,

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row names and matrixs

2011-05-15 Thread nielsen4897
Thank you - it is refreshing to have a helpful answer.  I am glad some
people remember the days when they were first learning too.



On Thu, May 12, 2011 at 4:58 PM, jlemaitre [via R] 
ml-node+3518836-766936252-236...@n4.nabble.com wrote:

 Nielsen,
 The numbers in the brackets reference a component of a matrix/data
 frame/vector. So if you have:
  x - c(1:10) # a vector of integers in sequence from 1-10
  x[3] # the third component of x
 [1] 3

 For 2-way matrices or data frames, the formatting is [row,column]. So, for
 a 10 x 10 matrix x:
  x - matrix(1:100, ncol = 10, byrow = T)
  x
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]12345678910
  [2,]   11   12   13   14   15   16   17   18   1920
  [3,]   21   22   23   24   25   26   27   28   2930
  [4,]   31   32   33   34   35   36   37   38   3940
  [5,]   41   42   43   44   45   46   47   48   4950
  [6,]   51   52   53   54   55   56   57   58   5960
  [7,]   61   62   63   64   65   66   67   68   6970
  [8,]   71   72   73   74   75   76   77   78   7980
  [9,]   81   82   83   84   85   86   87   88   8990
 [10,]   91   92   93   94   95   96   97   98   99   100

  x[,1] # return the first column of x
  [1]  1 11 21 31 41 51 61 71 81 91

  x[1,] # return the first row of x
  [1]  1  2  3  4  5  6  7  8  9 10

 when there's a minus, it just means that component is omitted

  x[-1,] # return x less the first row
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
  [1,]   11   12   13   14   15   16   17   18   1920
  [2,]   21   22   23   24   25   26   27   28   2930
  [3,]   31   32   33   34   35   36   37   38   3940
  [4,]   41   42   43   44   45   46   47   48   4950
  [5,]   51   52   53   54   55   56   57   58   5960
  [6,]   61   62   63   64   65   66   67   68   6970
  [7,]   71   72   73   74   75   76   77   78   7980
  [8,]   81   82   83   84   85   86   87   88   8990
  [9,]   91   92   93   94   95   96   97   98   99   100

 Given this context, I would double check the contents of test vs. test1.
 And don't let arrogant posts on this help forum discourage you.
 I hope this helps.


 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://r.789695.n4.nabble.com/Row-names-and-matrixs-tp3516372p3518836.html
 To unsubscribe from Row names and matrixs, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3516372code=bGluZHNleW5pZWxzZW5wY0BnbWFpbC5jb218MzUxNjM3MnwtMTcxNzE2OTY3OA==.





-- 
Lindsey Nielsen, Ph.D.
Los Alamos National Lab
(505) 667-2835


--
View this message in context: 
http://r.789695.n4.nabble.com/Row-names-and-matrixs-tp3516372p3524671.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pls help: lattice graph with both negative and positive value, x and y cross at 0 and negative value bars are plotted just oppositive direction in contrast to positive

2011-05-15 Thread Ram H. Sharma
Dear R experts:

Here is my problem:


#Data 1

Y - c(0.5, 0.1, 0.5, 1.3, 1.4, 1.6, 1.65, 2.4, 2.6, 3.4, 3.6, 4.3, 4.42,
4.8, 4.7, 3.4, 3.3, 2.8, 2.8, 1.2, 1.1, 0.5, 0.2, 0.1, -0.2, -1.5, -2.5,
-1.3, -0.5, -0.1)

X - seq(1:30)

X1 - c(rep(T1, 24), rep(T2, 6))

dat1 - data.frame(Y, X, X1)



require(lattice)

mcol - c(green, red)

barchart(Y ~ factor (X), group = X1, data = dat1,  col = mcol , ylab= y
var, xlab = x var, ylim = c(-3.0, 5.0), pos = 0,  scales =
list(x=list(rot= 90, font = 1, cex = 1) , y = list(rot= 90, font = 1, cex =
1) ))

The output is not what I want. I want the orientation of graph like the
following in base R but axis label are in Y axis line and other parameters
as in lattice:

barplot(Y, names.arg = X)

I know this is simple question, but I could not find a true solution.

-- 
Thanks in advance.

Ram H

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding dates to time series

2011-05-15 Thread Bazman76
Hi there,

I have a spreadsheet in excel which consists of first column  of dates and
then subsequent columns that refer to prices of different securities on
those dates. (the first row contains each series name)

I saved the excel file as type csv and then imported to excel using

prices=read.csv(file=C:/Documents and Settings/Hugh/My Documents/PhD/Option
prices.csv,header = TRUE, sep = ,)

This creates the correct time series data

x-ts(prices[,2])

but does not have the dates attached.

However the dates refer to working days. So although in general they
represent Monday-Friday this is not always the case because of holidays etc.

How then can I create a time series where the dates are read in from the
first column of the csv file? I can not find an example in R documentation
where this is done?

Thanks


--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3524679.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help with text processing / string split

2011-05-15 Thread eric
I used screen scraping to extract some information and put it into a table
called tbl. Now I want to modify the table a bit so the data can be more
useful. Here's the code I used:

library(XML)
rm(list=ls())
url -
http://webapp.montcopa.org/sherreal/salelist.asp?saledate=05/25/2011;
tbl -data.frame(readHTMLTable(url))[2:405, c(3,5,6,8,9)]
names(tbl) - c(Address, Township, Parcel, Sale Date, Costs)

tbl is attached as txt for your convenience. Entries in the last column of
the dataframe (tbl$Cost) appear as follows: $173,933.60$2,410.28  . 
http://r.789695.n4.nabble.com/file/n3524793/tbl.txt tbl.txt 

How do I:

1. Split the string
2. Have the two values show up as actual numbers that can be used
3. Put the numbers in two separate columns of the dataframe.

In other words $173,933.60$2,410.28 would show up as 173933.60 in one column
and 2410.28 would show up in a second column of tbl

I tried using strsplit but I could not get it working properly. 

 

--
View this message in context: 
http://r.789695.n4.nabble.com/Need-help-with-text-processing-string-split-tp3524793p3524793.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-15 Thread Aram Fingal

On May 13, 2011, at 6:38 AM, Michael Haenlein wrote:

 Dear all,
 
 I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
 i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
 calculations run for several days sometimes even weeks (mainly simulations
 over a large parameter space). Depending on the external conditions, my
 laptop sometimes shuts down due to overheating.


You didn't mention whether you are using a 64-bit OS or not.  A single 32-bit 
process can not use more than 2 GB RAM.  If your calculations would benefit 
from the full 8 GB RAM on your machine, you need to be able to run 64-bit R.   
My understanding is that, on Windows, you either have to install the OS as 
32-bit and use all 32-bit software or install 64-bit Windows and run all 64-bit 
software.  A Mac can run 32-bit and 64-bit software simultaneously and I'm not 
sure about Linux.  In the case of Linux, it probably doesn't matter so much 
because most Linux software is available as open source and you can compile it 
yourself either way.  

 
 I'm now thinking about buying a more powerful desktop PC or laptop. Can
 anybody advise me on the best configuration to run R as fast as possible? I
 will use this PC exclusively for R so any other factors are of limited
 importance.

You need to evaluate whether RAM or raw processor speed is most critical for 
what you're doing.  In my case, I upgraded my Mac Pro to 16 GB RAM and was able 
to do hierarchical clustering heatmaps overnight which previously took more 
than a week to compute.  Using the Activity Monitor utility, it looks like some 
of the, even larger, heatmap computations would benefit from 32 GB RAM or more. 
 

Linux runs on the widest range of hardware and that allows you the greatest 
ability to shop around.  If RAM is the deciding factor, then you can look 
around for a machine which can hold as much RAM as possible.  If processor 
speed is the factor, then you can optimize for that.  Windows runs on a 
reasonable array of hardware but has the disadvantage that the OS, itself, uses 
a lot of resources.  

The Mac has the advantage of flexibility.  When you download the precompiled R 
package, it comes with both a 32-bit and a 64-bit executable. This is because 
32-bit processes run a little faster if you don't need large amounts of RAM.  
If you do need the RAM, then you run the 64-bit version.  

Aram Fingal
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behaviour as.data.frame

2011-05-15 Thread Bert Gunter
Inline below.

On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan rh...@eoos.dds.nl wrote:
 Thanks. I also noticed myself minutes after sending my message to the list.
 My 'please ignore my question it was just a stupid typo' message was sent
 with the wrong account and is now awaiting moderation.

 However, my other question still stands: what is the
 preferred/fastest/simplest way to create a data.fame with given column types
 and dimensions?

I do not know, but  why is simply

data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE)

not acceptable? Note that if you had, say, 500, numeric (= double) and
100 character columns to add, you might do something like:

 z - matrix(numeric(5000),nr=10)
 u - matrix(character(1000),nr=10)
 frm - data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns

While this might save some typing, it may not be much more efficient
than typing it all out -- maybe just some parsing time is saved. You
can experiment and see.

However, since a data.frame **is** a list with added attributes and a
great deal of the work of the constructor is in constructing and
checking these attributes (e.g. row and column names), I see nothing
terribly inefficient with what you did. It's just a bit obscure.  But
maybe someone with greater expertise will set us both straight.

Cheers,
Bert



 Regards,
 Jan


 On 05/15/2011 04:43 PM, Bert Gunter wrote:

 In your post, you're missing the final s on the stringsAsFactors
 argument in the d1 assignment. When I typed it correctly, it works as
 expected.

 -- Bert

 On Sun, May 15, 2011 at 4:25 AM, Jan van der Laanrh...@eoos.dds.nl
  wrote:

 I use the following code to create two data.frames d1 and d2 from a list:
 types- c(integer, character, double)
 nlines- 10
 d1- as.data.frame(lapply(types, do.call, list(nlines)),
 stringsAsFactor=FALSE)
 l2- lapply(types, do.call, list(nlines))
 d2- as.data.frame(l2, stringsAsFactors=FALSE)

 I would expect d1 and d2 to be the same, however, in d1 the second column
 is
 a factor while in d2 it is a character (which I would expect):

 str(d1)

 'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: Factor w/ 1 level : 1 1 1
 1
 1 1 1 1 1 1
  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0

 str(d2)

 'data.frame':   10 obs. of  3 variables:
  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
  $ c: chr  ...
  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0


 As different but related question: I use the commands above to create an
 'empty' data.frame with specified column types and dimensions. I need
 this
 data.frame to pass on to my c++ routines. Is there a more simple/elegant
 way
 of creating this data.frame?

 Regards,

 Jan


 PS:
 I am running R on 64 bit Ubuntu 11.04:

 sessionInfo()

 R version 2.12.1 (2010-12-16)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.








-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-15 Thread Duncan Murdoch

On 15/05/2011 3:02 PM, Aram Fingal wrote:


On May 13, 2011, at 6:38 AM, Michael Haenlein wrote:


Dear all,

I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
calculations run for several days sometimes even weeks (mainly simulations
over a large parameter space). Depending on the external conditions, my
laptop sometimes shuts down due to overheating.



You didn't mention whether you are using a 64-bit OS or not.  A single 32-bit 
process can not use more than 2 GB RAM.  If your calculations would benefit 
from the full 8 GB RAM on your machine, you need to be able to run 64-bit R.   
My understanding is that, on Windows, you either have to install the OS as 
32-bit and use all 32-bit software or install 64-bit Windows and run all 64-bit 
software.  A Mac can run 32-bit and 64-bit software simultaneously and I'm not 
sure about Linux.  In the case of Linux, it probably doesn't matter so much 
because most Linux software is available as open source and you can compile it 
yourself either way.


No, 64 bit Windows can run either 32 or 64 bit Windows programs.




I'm now thinking about buying a more powerful desktop PC or laptop. Can
anybody advise me on the best configuration to run R as fast as possible? I
will use this PC exclusively for R so any other factors are of limited
importance.


You need to evaluate whether RAM or raw processor speed is most critical for 
what you're doing.  In my case, I upgraded my Mac Pro to 16 GB RAM and was able 
to do hierarchical clustering heatmaps overnight which previously took more 
than a week to compute.  Using the Activity Monitor utility, it looks like some 
of the, even larger, heatmap computations would benefit from 32 GB RAM or more.

Linux runs on the widest range of hardware and that allows you the greatest 
ability to shop around.  If RAM is the deciding factor, then you can look 
around for a machine which can hold as much RAM as possible.  If processor 
speed is the factor, then you can optimize for that.  Windows runs on a 
reasonable array of hardware but has the disadvantage that the OS, itself, uses 
a lot of resources.

The Mac has the advantage of flexibility.  When you download the precompiled R 
package, it comes with both a 32-bit and a 64-bit executable. This is because 
32-bit processes run a little faster if you don't need large amounts of RAM.  
If you do need the RAM, then you run the 64-bit version.



The same is true for Windows binaries on CRAN.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pls help: lattice graph with both negative and positive value, x and y cross at 0 and negative value bars are plotted just oppositive direction in contrast to positive

2011-05-15 Thread Dennis Murphy
Hi:

Try this:

barchart(Y ~ factor(X), group = X1, data = dat1,  col = mcol, origin = 0,
  ylab= y var, xlab = x var, ylim = c(-3.0, 5.0),
  scales = list(x=list(rot= 90, font = 1, cex = 1) ,
y = list(rot= 90, font = 1, cex = 1) ))

The origin = argument comes from panel.barchart(); see its help page
for more details.

HTH,
Dennis

On Sun, May 15, 2011 at 8:17 AM, Ram H. Sharma sharma.ra...@gmail.com wrote:
 Dear R experts:

 Here is my problem:


 #Data 1

 Y - c(0.5, 0.1, 0.5, 1.3, 1.4, 1.6, 1.65, 2.4, 2.6, 3.4, 3.6, 4.3, 4.42,
 4.8, 4.7, 3.4, 3.3, 2.8, 2.8, 1.2, 1.1, 0.5, 0.2, 0.1, -0.2, -1.5, -2.5,
 -1.3, -0.5, -0.1)

 X - seq(1:30)

 X1 - c(rep(T1, 24), rep(T2, 6))

 dat1 - data.frame(Y, X, X1)



 require(lattice)

 mcol - c(green, red)

 barchart(Y ~ factor (X), group = X1, data = dat1,  col = mcol , ylab= y
 var, xlab = x var, ylim = c(-3.0, 5.0), pos = 0,  scales =
 list(x=list(rot= 90, font = 1, cex = 1) , y = list(rot= 90, font = 1, cex =
 1) ))

 The output is not what I want. I want the orientation of graph like the
 following in base R but axis label are in Y axis line and other parameters
 as in lattice:

 barplot(Y, names.arg = X)

 I know this is simple question, but I could not find a true solution.

 --
 Thanks in advance.

 Ram H

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding dates to time series

2011-05-15 Thread Dennis Murphy
Hi:

I'd suggest using the zoo package; it allows you to use an index
vector such as dates to map to the series. It is well documented and
well maintained, with vignettes and an FAQ that can be found on its
package help page (among other places). Here is a small example:

dd - data.frame(time = seq(as.Date('1993-01-01'), by = 'months', length = 200),
 s = rnorm(200))
head(dd, 3)
time  s
1 2003-01-01  1.4292491
2 2003-02-01 -1.0713998
3 2003-03-01 -0.4738791

library(zoo)
ser - with(dd, zoo(s, time))   # s is the series, time is the index vector
str(ser)   # ser is of class zoo
plot(ser)  # apply the plot method

For finance applications, other possibilities include the xts and
quantmod packages, both of which are built on zoo.

HTH,
Dennis

On Sun, May 15, 2011 at 11:42 AM, Bazman76 h_a_patie...@hotmail.com wrote:
 Hi there,

 I have a spreadsheet in excel which consists of first column  of dates and
 then subsequent columns that refer to prices of different securities on
 those dates. (the first row contains each series name)

 I saved the excel file as type csv and then imported to excel using

 prices=read.csv(file=C:/Documents and Settings/Hugh/My Documents/PhD/Option
 prices.csv,header = TRUE, sep = ,)

 This creates the correct time series data

 x-ts(prices[,2])

 but does not have the dates attached.

 However the dates refer to working days. So although in general they
 represent Monday-Friday this is not always the case because of holidays etc.

 How then can I create a time series where the dates are read in from the
 first column of the csv file? I can not find an example in R documentation
 where this is done?

 Thanks


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3524679.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help with text processing / string split

2011-05-15 Thread jim holtman
try this:

 x - read.table('/temp/tbl.txt', sep = ',', header = TRUE, as.is = TRUE)
 # remove commas from the Cost column
 x$Cost - gsub(',', '', x$Cost)
 # split the Cost
 temp - strsplit(x$Cost, \\$)  # $ is special, so it is escaped
 temp - do.call(rbind, temp)  # create a matrix
 mode(temp) - 'numeric' # convert to numeric
 x$Cost1 - temp[, 2]
 x$Cost2 - temp[, 3]
 head(x)
   Address  Township   Parcel
  Sale.Date
2  10 PACER LN East Norriton 330006712005
Bnkrptcy-PP to6/29/2011
3   6 BALA AVE  Lower Merion 43292007
STAYED5/25/2011
4 109 STONY WAY, Condo 109 East Norriton 330008575662
Bnkrptcy-PP to6/29/2011
5   613 NORTHAMPTON RD East Norriton 330006103002
Postponed to5/25/2011
6  67 HIGH GATE LN  Whitpain 660002716764
Pstpnd by CO to5/25/2011
7 236 Arundel Ave aka 236 Arundel Road   Horsham 36136008
  For Sale5/25/2011
 Costs   Cost Cost1   Cost2
2 $173,933.60$2,410.28 $173933.60$2410.28 173933.60 2410.28
3   $264,640.36$168.00  $264640.36$168.00 264640.36  168.00
4  $70,029.04$1,483.59  $70029.04$1483.59  70029.04 1483.59
5 $254,873.19$1,772.62 $254873.19$1772.62 254873.19 1772.62
6 $404,507.59$1,947.90 $404507.59$1947.90 404507.59 1947.90
7 $252,472.27$1,034.51 $252472.27$1034.51 252472.27 1034.51



On Sun, May 15, 2011 at 3:50 PM, eric ericst...@aol.com wrote:
 I used screen scraping to extract some information and put it into a table
 called tbl. Now I want to modify the table a bit so the data can be more
 useful. Here's the code I used:

 library(XML)
 rm(list=ls())
 url -
 http://webapp.montcopa.org/sherreal/salelist.asp?saledate=05/25/2011;
 tbl -data.frame(readHTMLTable(url))[2:405, c(3,5,6,8,9)]
 names(tbl) - c(Address, Township, Parcel, Sale Date, Costs)

 tbl is attached as txt for your convenience. Entries in the last column of
 the dataframe (tbl$Cost) appear as follows: $173,933.60$2,410.28  .
 http://r.789695.n4.nabble.com/file/n3524793/tbl.txt tbl.txt

 How do I:

 1. Split the string
 2. Have the two values show up as actual numbers that can be used
 3. Put the numbers in two separate columns of the dataframe.

 In other words $173,933.60$2,410.28 would show up as 173933.60 in one column
 and 2410.28 would show up in a second column of tbl

 I tried using strsplit but I could not get it working properly.



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Need-help-with-text-processing-string-split-tp3524793p3524793.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find String Between Characters

2011-05-15 Thread jim holtman
I would assume that you have lines of text that do not include 'CIK='
and therefore  the 'sub' fails and you get the original string.  If
you only want the lines with CIK, then use 'grepl' to just extract
those lines before processing.

On Sat, May 14, 2011 at 10:14 PM, Sparks, John James jspa...@uic.edu wrote:
 Hi Jim,

 Thanks for your note.

 Unfortunately, when I attempt your solution in my exact setting, I get a
 weird and slightly different answer.

 First, let me be more clear.  What I am attempting to do is pull the CIK
 number out of the information from the web page itself after it has loaded
 to R (this may not be optimal, but I am new at this), not from the web
 page reference (as you have done).

 So, when I execute the following as per your suggestion:

 require(scrapeR)
 mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)

 num - sub(^.*CIK=([0-9]+).*, \\1, mmm)

 I get
 [1] pointer: 0x001265c0

 Is this just a hex representation of the same number, or is something else
 going on here?

 Comments from any and all would be much appreciated.

 --John J. Sparks, Ph.D.

 On Sat, May 14, 2011 7:57 pm, jim holtman wrote:
 Is this what you want:

 mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;
 num - sub(^.*CIK=([0-9]+).*, \\1, mmm)
 num
 [1] 320193



 On Sat, May 14, 2011 at 8:20 PM, Sparks, John James jspa...@uic.edu
 wrote:
 Dear R Helpers,

 I am trying to isolate a set of characters between two other characters
 in
 a long string file.  I tried some of the examples on the R help pages
 and
 elsewhere, but I am not able to get it.  Your help would be much
 appreciated.

 require(scrapeR)
 mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;)
 str(mmm)

 I want to get the number 320193 that is between the CIK= and the .
  I
 have tried

 g - grep( CIK=|, mmm )
 and
 temp-grep(mmm,\CIK=\)

 and variations on these themes, but all won't run or come bask as an
 empty
 object.  How can I grab this number?

 Best wishes,
 --John J. Sparks, Ph.D.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?








-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to merge within range?

2011-05-15 Thread Kenn Konstabel
I'd've first said it's simply

sapply(df1$time, function(x) if(any(foo - (x=df2$from 
x=df2$to))0) df2$value[which(foo)] else NA )

but the following are much nicer (except that instead of NA you'll
have 0 but that's easy to change if necessary):

 colSums(sapply(df1$time, function(x) (x=df2$from  x =df2$to) * df2$value) )
 rowSums(outer(df1$time, df2$from, =) * outer(df1$time, df2$to,
=) * df2$value)



On Sat, May 14, 2011 at 10:08 PM, René Mayer
ma...@psychologie.tu-dresden.de wrote:
 sqldf is impressive - compiled it now;
 the trick with findInterval is nice, too.
 thanks guys!!




 Zitat von David Winsemius dwinsem...@comcast.net:


 On May 14, 2011, at 2:27 PM, William Dunlap wrote:

 You could use findInterval() along with a trick with c(rbind(...)):

 i - findInterval(x=df.1$time, vec=c(rbind(df.2$from, df.2$to)))
 i

 [1] 1 1 1 2 3 3 3 5 5 6

 That's nice. I was working on a slightly different trick

 findInterval( df.1[,1],t(df.2[,1:2]))
  [1] 1 1 1 2 3 3 3 5 5 6

 I was then trying to get the right indices with (.)'%%' 2 and (.) '%/%' 2


 The even-valued outputs would map to NA's, the odds
 to value[(i+1)/2], but you can use the c(rbind(...)) trick again:

 c(rbind(df.2$value, NA))[i]

 [1]  1  1  1 NA  3  3  3  5  5 NA

 I'd like to understand that. Maybe, maybe... ah, got it. At first I didn't
 realize those were the final answers since they looked like indices. My t(.)
 trick doesn't generalize as well.


 My earlier suggestion tht two merges woul do it was based on my erroneous
 interpretation of the example, since  I thought the task was to match on the
 end points of the intervals.


 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of René Mayer
 Sent: Saturday, May 14, 2011 11:06 AM
 To: David Winsemius
 Cc: r-help@r-project.org
 Subject: Re: [R] how to merge within range?

 thanks David and Ian,
 let me make a better example as the first one was flawed

 df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA)
 names(df.1) = c(time, value)
 df.1
  time value
 1   101    NA
 2   199    NA
 3   301    NA
 4   401    NA
 5   501    NA
 6   601    NA
 7   700    NA
 8   800    NA
 9   900    NA
 10 1000    NA

 # from and to define ranges within time,
 # note that from and to may not match the numbers given in time
 df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5))
 df.2
  from  to value
 1   99 303     1
 2  500 702     3
 3  799 950     5

 what I want is:
  time value
 1   101    1
 2   199    1
 3   301    1
 4   401    NA
 5   501    3
 6   601    3
 7   700    3
 8   800    5
 9   900    5
 10 1000    NA

 @David I don't know what you mean by 2 merges,
 René





 Zitat von David Winsemius dwinsem...@comcast.net:


 On May 14, 2011, at 9:16 AM, Ian Gow wrote:

 If I assume that the third column in data.frame.2 is named

 val then in

 SQL terms it _seems_ you want

 SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN

 data.frame.2 AS b ON

 a.time BETWEEN b.start AND b.end;

 Not sure how to do that elegantly using R subsetting/merge,

 Huh? It's just two merge()'s (... once you fix the error in

 the example.)

 --
 David

 but you might
 try a package that allows you to use SQL, such as sqldf.


 On 5/14/11 8:03 AM, David Winsemius

 dwinsem...@comcast.net wrote:


 On May 14, 2011, at 8:12 AM, René Mayer wrote:

 Hello,
 how can one merge

 And what happened when you typed:

 ?merge

 two data frames when in the second data frame one column

 defines the

 start values
 and another defines the end value of the to be merged range.
 data.frame.1
 time ...
 13
 24
 35
 46
 55
 ...
 data.frame.2
 start end
 24 37 ?h? ?
 ...

 should result in this
 13 NA
 24 ?h?
 35 ?h?
 46 NA
 55
 ?

 And _why_ would that be?


 thanks,
 René

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,

 reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 David Winsemius, MD
 West Hartford, CT



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 David Winsemius, MD
 West Hartford, CT



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 

Re: [R] changing the day of the week in dates format

2011-05-15 Thread jim holtman
What is it that you want to do?  If you move the dates forward a year,
then what does it mean to add one year to 2/29/2008?  You did mention
accounting for leap year.  It goes the other way with 2/28/2007 and
3/1/2007; what is your expectation in these cases?  You can always
convert everything to characters and then substring out the year and
put the new one in, and then check for the leap year condition and do
the appropriate action.

The equation that you used would add 6 hours to each each succeeding
year's date.

So I ask my favorite question:  what is the problem that you are
trying to solve?

On Sun, May 15, 2011 at 11:13 AM, Dave Evens daveeve...@yahoo.co.uk wrote:
 Hi Adrian,

 Many thanks for your reply.

 Suppose I wanted to increment the date by a year - how would I account for 
 things like leap years?

 Would I just do
 mydaysx[select] - mydaysx[select] + 365.25*24*60*60

 Regards,Dave


 
 From: Adrian Duffner duffn...@googlemail.com

 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Sunday, 15 May 2011, 14:21
 Subject: Re: [R] changing the day of the week in dates format

 Hi Dave,

 your problem is that you are working with a S3 class, what is mainly a
 list with naming convention. Hence it is possible to change just one
 entry of the list, but it is nearly never recommendable.

 So a slight change to your code should provide you the required output:
 mydaysx[select] - mydaysx[select] + 2*24*60*60
 select - mydaysx$wday==6
 sum(select)
 [1] 0

 In this case not only the entry $mday of the list is changed, but the
 whole object is updated.

 Cheers
 Adrian

 Am 14.05.2011 20:44, schrieb Dave Evens:
 Dear all,

 I have a question related to the POSIXlt function in R.

 I have a set of dates and times, for exmaple:

 startx- as.POSIXct(2011-01-01 00:00:00)
 finx- as.POSIXct(2011-12-31 00:00:00)

 daysx- seq(startx, finx, by=24 hours)

 I
   want to change the dates of all the days falling on a Saturday to the
 next working day (i.e. Monday). So I convert dates to POSIXlt

 mydaysx- as.POSIXlt(daysx)

 Then I change select all the Saturday's and move them on to Monday

 select- mydaysx$wday==6
 mydaysx$mday[select]- mydaysx$mday[select] + 2

 However,
   although all the new dates (i.e. mydaysx) are actual days of the year -
   the $wday have not been updated and the $mdays have not all been
 corrected (i.e. those falling into the next month). So if I do

 select- mydaysx$wday==6

 I still get the same set of days as before.

 Is there a way to do this?

 Thanks,

     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R function that returns an object's search path position XXXX

2011-05-15 Thread Rolf Turner

On 15/05/11 13:41, Dan Abner wrote:

Hello everyone,

Is there an R function that returns an object's search path position?


?find

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem converting character to dates

2011-05-15 Thread Assu
I have imported the data fram Excel and it comes like this:

Calendar Year/Month 01.2008 01.2008 01.2008 01.2008 01.2008 02.2008 02.2008
Calendar year / week01.2008 02.2008 03.2008 04.2008 05.2008 05.2008 06.2008

There are repeats in the weeks both belonging to two months. It's the same
at the end of the year.
So, I think it was to do with the data acquisition, probably, and how it is
saved in the database.

Thanks for your help and interest.

--
View this message in context: 
http://r.789695.n4.nabble.com/problem-converting-character-to-dates-tp3517918p3524837.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] High standard error

2011-05-15 Thread superduperfly
Significant or not you should look at the p-value of your coefficient
estimation. If your prob. is significantly diff. from zero, you start to
interpret your coefficient from there.

Might help to look up Std.Err definition, its is some what similar to the
standard deviation. Good luck.

--
View this message in context: 
http://r.789695.n4.nabble.com/High-standard-error-tp3329903p3524958.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Snow/Snowfall hangs on windows 7

2011-05-15 Thread David Anisman
Same problem as Anna here.

Windows 7 64-bit. Running R 2.13.0. snow + snowfall installed.

Testing:

library(snow)
library(snowfall)

sfInit(parallel=TRUE, cpus=2, type=SOCK)

Then R spins forever (yes, I disabled the Windows firewall).

On the same box, tried the same on Ubuntu under Virtualbox. No problem. Runs
well.

Any suggestions/ideas appreciated.


David



--
View this message in context: 
http://r.789695.n4.nabble.com/Snow-Snowfall-hangs-on-windows-7-tp3436724p3524990.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding dates to time series

2011-05-15 Thread Bazman76
OK I got it to work thanks to your example

plot(ser)

however, ultiamtely a I need a stype ts object.

So I used

 xts - as.ts(ser) 
 xts
Time Series:
Start = 1 
End = 732 
Frequency = 1 

which just gets me back to where I started with the correct raw data but no
attached dates?

It is possible to have a time series ts() object with irregular dates?

--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3525001.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding dates to time series

2011-05-15 Thread Bazman76
additional!!

I now realise that the time series created below is in the wrong order!

clearly the column of dates are not being interpreted as dates by the R. Is
is possible for R to read column one as dates? how can I do this?

dd-data.frame(prices[,1],prices[,2])
 head(dd,3)
   prices...1.  prices...2.
1 16/12/2004 0.13675654
2 17/12/2004 0.22967560
3 20/12/2004 0.01841611
 ser-with(dd,zoo(prices[,2],prices[,1]))
 plot(ser)
 xts - as.ts(xzoo) 
Error in as.ts(xzoo) : object 'xzoo' not found
 xts - as.ts(ser) 
 xts
Time Series:
Start = 1 
End = 732 
Frequency = 1 


--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3525011.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rbind with partially overlapping column names

2011-05-15 Thread Jonathan Flowers
Hello,

I would like to merge two data frames with partially overlapping column
names with an rbind-like operation.

For the follow data frames,

df1 - data.frame(a=c(A,A),b=c(B,B))
df2 - data.frame(b=c(b,b),c=c(c,c))

I would like the output frame to be (with NAs where the frames don't
overlap)

a  b c
A B NA
A B NA
NA   b c
NA   b c

I am familiar with ?merge and ?rbind, but neither seem to offer a means to
accomplish this.

Thanks in advance.

Jonathan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind with partially overlapping column names

2011-05-15 Thread Ian Gow
Hi:

This is a bit of a kluge, but works for your test case:

 df2[,setdiff(names(df1),names(df2))] - NA
 df1[,setdiff(names(df2),names(df1))] - NA
 df3 - rbind(df1,df2)
 df3
a b c
1 A B NA
2 A B NA
3 NA b c
4 NA b c

-Ian


On 5/15/11 7:41 PM, Jonathan Flowers jonathanmflow...@gmail.com wrote:

Hello,

I would like to merge two data frames with partially overlapping column
names with an rbind-like operation.

For the follow data frames,

df1 - data.frame(a=c(A,A),b=c(B,B))
df2 - data.frame(b=c(b,b),c=c(c,c))

I would like the output frame to be (with NAs where the frames don't
overlap)

a  b c
A B NA
A B NA
NA   b c
NA   b c

I am familiar with ?merge and ?rbind, but neither seem to offer a means to
accomplish this.

Thanks in advance.

Jonathan

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding dates to time series

2011-05-15 Thread Gabor Grothendieck
On Sun, May 15, 2011 at 2:42 PM, Bazman76 h_a_patie...@hotmail.com wrote:
 Hi there,

 I have a spreadsheet in excel which consists of first column  of dates and
 then subsequent columns that refer to prices of different securities on
 those dates. (the first row contains each series name)

 I saved the excel file as type csv and then imported to excel using

 prices=read.csv(file=C:/Documents and Settings/Hugh/My Documents/PhD/Option
 prices.csv,header = TRUE, sep = ,)

 This creates the correct time series data

 x-ts(prices[,2])

 but does not have the dates attached.

 However the dates refer to working days. So although in general they
 represent Monday-Friday this is not always the case because of holidays etc.

 How then can I create a time series where the dates are read in from the
 first column of the csv file? I can not find an example in R documentation
 where this is done?


Lines - time,s
2003-01-01,1.4292491
2003-02-01,-1.0713998
2003-03-01,-0.4738791

library(zoo)

# F - C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv
# z - read.zoo(F, header = TRUE, sep = ,)

# in reality we would read from the file as shown in the comments above
# but here we do it this way so we can just copy it and paste it
# verbatim into the R session

z - read.zoo(textConnection(Lines), header = TRUE, sep = ,)

If you want xts then:

library(xts)
x - as.xts(z)

Note that ts is not good for dates so use zoo or xts.

See ?read.zoo in the zoo package.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind with partially overlapping column names

2011-05-15 Thread Dennis Murphy
Hi:

Another way, with a little less typing but using the same principle, is

df1$c - df2$a - NA
rbind(df1, df2)

Dennis

On Sun, May 15, 2011 at 5:50 PM, Ian Gow iand...@gmail.com wrote:
 Hi:

 This is a bit of a kluge, but works for your test case:

 df2[,setdiff(names(df1),names(df2))] - NA
 df1[,setdiff(names(df2),names(df1))] - NA
 df3 - rbind(df1,df2)
 df3
 a b c
 1 A B NA
 2 A B NA
 3 NA b c
 4 NA b c

 -Ian


 On 5/15/11 7:41 PM, Jonathan Flowers jonathanmflow...@gmail.com wrote:

Hello,

I would like to merge two data frames with partially overlapping column
names with an rbind-like operation.

For the follow data frames,

df1 - data.frame(a=c(A,A),b=c(B,B))
df2 - data.frame(b=c(b,b),c=c(c,c))

I would like the output frame to be (with NAs where the frames don't
overlap)

a      b     c
A     B     NA
A     B     NA
NA   b     c
NA   b     c

I am familiar with ?merge and ?rbind, but neither seem to offer a means to
accomplish this.

Thanks in advance.

Jonathan

       [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Snow/Snowfall hangs on windows 7

2011-05-15 Thread David Anisman
btw, I installed R.10.1 on the same box (Windows 7, 64bit, 4 cores). 

snow/snowfall work fine.

here is my sessionInfo()

R version 2.10.1 (2009-12-14) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C  
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] snowfall_1.84 snow_0.3-3   

loaded via a namespace (and not attached):
[1] tools_2.10.1


David

--
View this message in context: 
http://r.789695.n4.nabble.com/Snow-Snowfall-hangs-on-windows-7-tp3436724p3525182.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on approximations of full logistic regression model

2011-05-15 Thread Frank Harrell
I think you are doing this correctly except for one thing.  The validation
and other inferential calculations should be done on the full model.  Use
the approximate model to get a simpler nomogram but not to get standard
errors.  With only dropping one variable you might consider just running the
nomogram on the entire model.
Frank


細田弘吉 wrote:
 
 Hi,
 I am trying to construct a logistic regression model from my data (104
 patients and 25 events). I build a full model consisting of five
 predictors with the use of penalization by rms package (lrm, pentrace
 etc) because of events per variable issue. Then, I tried to approximate
 the full model by step-down technique predicting L from all of the
 componet variables using ordinary least squares (ols in rms package) as
 the followings. I would like to know whether I am doing right or not.
 
 library(rms)
 plogit - predict(full.model)
 full.ols - ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
 fastbw(full.ols, aics=1e10)
 
  Deleted   Chi-Sq d.f. P  Residual d.f. P  AICR2
  stenosis   1.41  10.2354   1.41   10.2354  -0.59 0.991
  x216.78  10.  18.19   20.0001  14.19 0.882
  procedure 26.12  10.  44.31   30.  38.31 0.711
  ClinicalScore 25.75  10.  70.06   40.  62.06 0.544
  x183.42  10. 153.49   50. 143.49 0.000
 
 Then, fitted an approximation to the full model using most imprtant
 variable (R^2 for predictions from the reduced model against the
 original Y drops below 0.95), that is, dropping stenosis.
 
 full.ols.approx - ols(plogit ~ x1+x2+ClinicalScore+procedure)
 full.ols.approx$stats
   n  Model L.R.d.f.  R2   g   Sigma
 104.000 487.9006640   4.000   0.9908257   1.3341718   0.1192622
 
 This approximate model had R^2 against the full model of 0.99.
 Therefore, I updated the original full logistic model dropping
 stenosis as predictor.
 
 full.approx.lrm - update(full.model, ~ . -stenosis)
 
 validate(full.model, bw=F, B=1000)
   index.orig trainingtest optimism index.correctedn
 Dxy   0.6425   0.7017  0.6131   0.0887  0.5539 1000
 R20.3270   0.3716  0.3335   0.0382  0.2888 1000
 Intercept 0.   0.  0.0821  -0.0821  0.0821 1000
 Slope 1.   1.  1.0548  -0.0548  1.0548 1000
 Emax  0.   0.  0.0263   0.0263  0.0263 1000
 
 validate(full.approx.lrm, bw=F, B=1000)
   index.orig trainingtest optimism index.correctedn
 Dxy   0.6446   0.6891  0.6265   0.0626  0.5820 1000
 R20.3245   0.3592  0.3428   0.0164  0.3081 1000
 Intercept 0.   0.  0.1281  -0.1281  0.1281 1000
 Slope 1.   1.  1.1104  -0.1104  1.1104 1000
 Emax  0.   0.  0.0444   0.0444  0.0444 1000
 
 Validatin revealed this approximation was not bad.
 Then, I made a nomogram.
 
 full.approx.lrm.nom - nomogram(full.approx.lrm,
 fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
 plot(full.approx.lrm.nom)
 
 Another nomogram using ols model,
 
 full.ols.approx.nom - nomogram(full.ols.approx,
 fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)
 plot(full.ols.approx.nom)
 
 These two nomograms are very similar but a little bit different.
 
 My questions are;
 
 1. Am I doing right?
 
 2. Which nomogram is correct
 
 I would appreciate your help in advance.
 
 -- 
 KH
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind with partially overlapping column names

2011-05-15 Thread William Dunlap

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Flowers
 Sent: Sunday, May 15, 2011 5:41 PM
 To: r-help@r-project.org
 Subject: [R] rbind with partially overlapping column names
 
 Hello,
 
 I would like to merge two data frames with partially 
 overlapping column
 names with an rbind-like operation.
 
 For the follow data frames,
 
 df1 - data.frame(a=c(A,A),b=c(B,B))
 df2 - data.frame(b=c(b,b),c=c(c,c))
 
 I would like the output frame to be (with NAs where the frames don't
 overlap)
 
 a  b c
 A B NA
 A B NA
 NA   b c
 NA   b c
 
 I am familiar with ?merge and ?rbind, but neither seem to 
 offer a means to
 accomplish this.

What is wrong with merge(all=TRUE,...)?
   merge(df1,df2,all=TRUE)
bac
  1 BA NA
  2 BA NA
  3 b NAc
  4 b NAc
Rearrange the columns if that is necessary
   merge(df1,df2,all=TRUE)[c(a,b,c)]
   a bc
  1A B NA
  2A B NA
  3 NA bc
  4 NA bc

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
 
 Thanks in advance.
 
 Jonathan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extracting the dimnames of an array with variable dimensions

2011-05-15 Thread Pierre Roudier
Hi list,

In a function I am writing, I need to extract the dimension names of
an array. I know this can be acheived easily using dimnames() but my
problem is that I want my function to be robust when the number of
dimensions varies. Consider the following case:

foo - array(data = rnorm(32), dim = c(4,4,2),
dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))

# What I want is to extract the *names of the dimensions* for which
foo have positive values:
ind - which(foo  0, arr.ind = TRUE)

# A first solution is:
t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3],
drop=FALSE]
# But it does require to know the dimensions of foo

I would like to do something like:

ind - which(foo  0, arr.ind = TRUE)
t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE]

but in that case the dimnames are dropped.

Any suggestion?

Cheers,

Pierre

-- 
Scientist
Landcare Research, New Zealand

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind with partially overlapping column names

2011-05-15 Thread Ian Gow
That approach relies on df1 and df2 not having overlapping values in b.
Slight variation in df2 gives different results:

 df1 - data.frame(a=c(A,A),b=c(B,B))
 df2 - data.frame(b=c(B,B),c=c(c,c))
 merge(df1,df2,all=TRUE)
  b a c
1 B A c
2 B A c
3 B A c
4 B A c


On 5/15/11 11:19 PM, William Dunlap wdun...@tibco.com wrote:


 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Flowers
 Sent: Sunday, May 15, 2011 5:41 PM
 To: r-help@r-project.org
 Subject: [R] rbind with partially overlapping column names
 
 Hello,
 
 I would like to merge two data frames with partially
 overlapping column
 names with an rbind-like operation.
 
 For the follow data frames,
 
 df1 - data.frame(a=c(A,A),b=c(B,B))
 df2 - data.frame(b=c(b,b),c=c(c,c))
 
 I would like the output frame to be (with NAs where the frames don't
 overlap)
 
 a  b c
 A B NA
 A B NA
 NA   b c
 NA   b c
 
 I am familiar with ?merge and ?rbind, but neither seem to
 offer a means to
 accomplish this.

What is wrong with merge(all=TRUE,...)?
   merge(df1,df2,all=TRUE)
bac
  1 BA NA
  2 BA NA
  3 b NAc
  4 b NAc
Rearrange the columns if that is necessary
   merge(df1,df2,all=TRUE)[c(a,b,c)]
   a bc
  1A B NA
  2A B NA
  3 NA bc
  4 NA bc

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
 
 Thanks in advance.
 
 Jonathan
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on approximations of full logistic regression model

2011-05-15 Thread khosoda

Thank you for your reply, Prof. Harrell.

I agree with you. Dropping only one variable does not actually help a lot.

I have one more question.
During analysis of this model I found that the confidence
intervals (CIs) of some coefficients provided by bootstrapping (bootcov 
function in rms package) was narrower than CIs provided by usual 
variance-covariance matrix and CIs of other coefficients wider.  My data 
has no cluster structure. I am wondering which CIs are better.

I guess bootstrapping one, but is it right?

I would appreciate your help in advance.
--
KH



(11/05/16 12:25), Frank Harrell wrote:

I think you are doing this correctly except for one thing.  The validation
and other inferential calculations should be done on the full model.  Use
the approximate model to get a simpler nomogram but not to get standard
errors.  With only dropping one variable you might consider just running the
nomogram on the entire model.
Frank


KH wrote:


Hi,
I am trying to construct a logistic regression model from my data (104
patients and 25 events). I build a full model consisting of five
predictors with the use of penalization by rms package (lrm, pentrace
etc) because of events per variable issue. Then, I tried to approximate
the full model by step-down technique predicting L from all of the
componet variables using ordinary least squares (ols in rms package) as
the followings. I would like to know whether I am doing right or not.


library(rms)
plogit- predict(full.model)
full.ols- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1)
fastbw(full.ols, aics=1e10)


  Deleted   Chi-Sq d.f. P  Residual d.f. P  AICR2
  stenosis   1.41  10.2354   1.41   10.2354  -0.59 0.991
  x216.78  10.  18.19   20.0001  14.19 0.882
  procedure 26.12  10.  44.31   30.  38.31 0.711
  ClinicalScore 25.75  10.  70.06   40.  62.06 0.544
  x183.42  10. 153.49   50. 143.49 0.000

Then, fitted an approximation to the full model using most imprtant
variable (R^2 for predictions from the reduced model against the
original Y drops below 0.95), that is, dropping stenosis.


full.ols.approx- ols(plogit ~ x1+x2+ClinicalScore+procedure)
full.ols.approx$stats

   n  Model L.R.d.f.  R2   g   Sigma
104.000 487.9006640   4.000   0.9908257   1.3341718   0.1192622

This approximate model had R^2 against the full model of 0.99.
Therefore, I updated the original full logistic model dropping
stenosis as predictor.


full.approx.lrm- update(full.model, ~ . -stenosis)



validate(full.model, bw=F, B=1000)

   index.orig trainingtest optimism index.correctedn
Dxy   0.6425   0.7017  0.6131   0.0887  0.5539 1000
R20.3270   0.3716  0.3335   0.0382  0.2888 1000
Intercept 0.   0.  0.0821  -0.0821  0.0821 1000
Slope 1.   1.  1.0548  -0.0548  1.0548 1000
Emax  0.   0.  0.0263   0.0263  0.0263 1000


validate(full.approx.lrm, bw=F, B=1000)

   index.orig trainingtest optimism index.correctedn
Dxy   0.6446   0.6891  0.6265   0.0626  0.5820 1000
R20.3245   0.3592  0.3428   0.0164  0.3081 1000
Intercept 0.   0.  0.1281  -0.1281  0.1281 1000
Slope 1.   1.  1.1104  -0.1104  1.1104 1000
Emax  0.   0.  0.0444   0.0444  0.0444 1000

Validatin revealed this approximation was not bad.
Then, I made a nomogram.


full.approx.lrm.nom- nomogram(full.approx.lrm,

fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)

plot(full.approx.lrm.nom)


Another nomogram using ols model,


full.ols.approx.nom- nomogram(full.ols.approx,

fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis)

plot(full.ols.approx.nom)


These two nomograms are very similar but a little bit different.

My questions are;

1. Am I doing right?

2. Which nomogram is correct

I would appreciate your help in advance.

--
KH

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



E-mail address
Office: khos...@med.kobe-u.ac.jp

Re: [R] graphs of gamma, normal fit to a histogram are about half as large as they should be

2011-05-15 Thread Benjamin Caldwell
Hmm; still missing something - hist defaults to frequencies, not prob.
densities; and, I thought I'd scaled the fitted lines to the values in the
data frame. Just going with it, I specified freq=FALSE, and the prob density
was of course at a different order of magnitude than the lines.

What are you trying to hint at?


On Fri, May 13, 2011 at 6:05 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote:

 On 14/05/11 10:00, Benjamin Caldwell wrote:

 Hello,

 I'm trying to compare the fit of two distributions, normal and gamma, to a
 histogram of my response variable.


 rate-mean(na.omit(rwb$post.f.crwn.length))/var(na.omit(rwb$post.f.crwn.length))
 shape-rate*mean(na.omit(rwb$post.f.crwn.length))
 hist((rwb$post.f.crwn.length), main=rwb$post.f.crwn.length)

 lines(seq(0.01,70,0.01),length(rwb$post.f.crwn.length)*dgamma(seq(0.01,70,0.01),shape,rate))

 lines(seq(0,70,0.1),length(na.omit(rwb$post.f.crwn.length))*dnorm(seq(0,70,.1),mean(na.omit(rwb$post.f.crwn.length)),sqrt(var(na.omit(rwb$post.f.crwn.length

 However, the height of the two curves are about 1/3 to 1/4 the height that
 they should be compared to the histogram. Any ideas?


 Yes.  Read the help on hist!  (Hint:  Pay particular attention to the
 freq and/or probability arguments.)

cheers,

Rolf Turner


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting the dimnames of an array with variable dimensions

2011-05-15 Thread Dennis Murphy
Hi:

Does it have to be an array?  If all you're interested in is the
dimnames, how about this?

library(plyr)
foo - array(data = rnorm(32), dim = c(4,4,2),
   dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))
 foo
, , e

   A  B  C  D
a -0.2183877 -0.8912908 -2.0175612 -0.8080548
b  0.4870784 -0.8626293 -0.5641368 -0.5219722
c  0.8821044  0.3187850  1.2203297 -0.3151186
d -0.9894656 -1.1779108  0.9853935  0.3560747

, , f

   A  B  C  D
a  0.7357773 -1.7591637  1.6320887  1.2248529
b  0.4662315  0.1131432 -0.9790887 -0.6575306
c -0.3564725 -0.9202688  0.1017894  0.7382683
d  0.2825117  0.9242299  0.3577063 -1.3297339

# flatten array into a data frame with dimnames as factors
# adply() converts an array to a data frame, applying a function
# along the stated dimensions
 u - adply(foo, c(1, 2, 3), as.vector)
subset(u, V1  0)[, 1:3]
   X1 X2 X3
2   b  A  e
3   c  A  e
7   c  B  e
11  c  C  e
12  d  C  e
16  d  D  e
17  a  A  f
18  b  A  f
20  d  A  f
22  b  B  f
24  d  B  f
25  a  C  f
27  c  C  f
28  d  C  f
29  a  D  f
31  c  D  f

HTH,
Dennis

On Sun, May 15, 2011 at 9:20 PM, Pierre Roudier
pierre.roud...@gmail.com wrote:
 Hi list,

 In a function I am writing, I need to extract the dimension names of
 an array. I know this can be acheived easily using dimnames() but my
 problem is that I want my function to be robust when the number of
 dimensions varies. Consider the following case:

 foo - array(data = rnorm(32), dim = c(4,4,2),
 dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))

 # What I want is to extract the *names of the dimensions* for which
 foo have positive values:
 ind - which(foo  0, arr.ind = TRUE)

 # A first solution is:
 t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3],
 drop=FALSE]
 # But it does require to know the dimensions of foo

 I would like to do something like:

 ind - which(foo  0, arr.ind = TRUE)
 t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE]

 but in that case the dimnames are dropped.

 Any suggestion?

 Cheers,

 Pierre

 --
 Scientist
 Landcare Research, New Zealand

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Powerful PC to run R

2011-05-15 Thread Prof Brian Ripley

On Sun, 15 May 2011, Duncan Murdoch wrote:


On 15/05/2011 3:02 PM, Aram Fingal wrote:


On May 13, 2011, at 6:38 AM, Michael Haenlein wrote:


Dear all,

I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core
i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my
calculations run for several days sometimes even weeks (mainly simulations
over a large parameter space). Depending on the external conditions, my
laptop sometimes shuts down due to overheating.



You didn't mention whether you are using a 64-bit OS or not.  A single 
32-bit process can not use more than 2 GB RAM.


And that is also false.  For Windows, see the rw-FAQ.  It is 
address space (not RAM) that is limited, and it is limited to 4GB *by 
definition* in a 32-bit process.  Many OSes can give your process 4GB 
of address space, but may reserve some of it for the OS.


 If your calculations would 
benefit from the full 8 GB RAM on your machine, you need to be able to run 
64-bit R.   My understanding is that, on Windows, you either have to 
install the OS as 32-bit and use all 32-bit software or install 64-bit 
Windows and run all 64-bit software.  A Mac can run 32-bit and 64-bit 
software simultaneously and I'm not sure about Linux.  In the case of 
Linux, it probably doesn't matter so much because most Linux software is 
available as open source and you can compile it yourself either way.


For the record, all modern 64-bit OSes on x86_64 cpus can do this 
provided you install 32-bit versions of core dynamic libraries.  I run 
32- and 64-bit R on 64-bit Linux, Solaris, FreeBSD, Darwin (the OS of 
Mac OS X), Windows   As can AIX and IRIX on their CPUs.



No, 64 bit Windows can run either 32 or 64 bit Windows programs.




I'm now thinking about buying a more powerful desktop PC or laptop. Can
anybody advise me on the best configuration to run R as fast as possible? 
I

will use this PC exclusively for R so any other factors are of limited
importance.


You need to evaluate whether RAM or raw processor speed is most critical 
for what you're doing.  In my case, I upgraded my Mac Pro to 16 GB RAM and 
was able to do hierarchical clustering heatmaps overnight which previously 
took more than a week to compute.  Using the Activity Monitor utility, it 
looks like some of the, even larger, heatmap computations would benefit 
from 32 GB RAM or more.


Linux runs on the widest range of hardware and that allows you the greatest 
ability to shop around.  If RAM is the deciding factor, then you can look 
around for a machine which can hold as much RAM as possible.  If processor 
speed is the factor, then you can optimize for that.  Windows runs on a 
reasonable array of hardware but has the disadvantage that the OS, itself, 
uses a lot of resources.


Nothing like as much as Mac OS X, though.  (I would say the main 
disadvantage of Windows for R is the slowness of the file systems.)


The Mac has the advantage of flexibility.  When you download the 
precompiled R package, it comes with both a 32-bit and a 64-bit executable. 
This is because 32-bit processes run a little faster if you don't need 
large amounts of RAM.  If you do need the RAM, then you run the 64-bit 
version.




The same is true for Windows binaries on CRAN.


And of e.g. the Fedora binaries.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


Mr Fingal: please do!  You are clearly unfamiliar with the R manuals.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] graphs of gamma, normal fit to a histogram are about half as large as they should be

2011-05-15 Thread Rolf Turner

In your example it appears that you are plotting a histogram (on the 
frequency
scale) and then superimposing scalar multiples of gamma and Gaussian 
densities.

You should just plot a histogram (with frequency=FALSE) and then 
superimpose the
densities --- without any scalar multipliers.

If that doesn't work, please provide a minimal *reproducible* (no one 
but you
has the ``rwb'' data object) example of the problem that you are having
(as the posting guide requests).

 cheers,

 Rolf Turner


On 16/05/11 17:01, Benjamin Caldwell wrote:
 Hmm; still missing something - hist defaults to frequencies, not prob. 
 densities; and, I thought I'd scaled the fitted lines to the values in 
 the data frame. Just going with it, I specified freq=FALSE, and the 
 prob density was of course at a different order of magnitude than the 
 lines.

 What are you trying to hint at?


 On Fri, May 13, 2011 at 6:05 PM, Rolf Turner rolf.tur...@xtra.co.nz 
 mailto:rolf.tur...@xtra.co.nz wrote:

 On 14/05/11 10:00, Benjamin Caldwell wrote:

 Hello,

 I'm trying to compare the fit of two distributions, normal and
 gamma, to a
 histogram of my response variable.

 
 rate-mean(na.omit(rwb$post.f.crwn.length))/var(na.omit(rwb$post.f.crwn.length))
 shape-rate*mean(na.omit(rwb$post.f.crwn.length))
 hist((rwb$post.f.crwn.length), main=rwb$post.f.crwn.length)
 
 lines(seq(0.01,70,0.01),length(rwb$post.f.crwn.length)*dgamma(seq(0.01,70,0.01),shape,rate))
 
 lines(seq(0,70,0.1),length(na.omit(rwb$post.f.crwn.length))*dnorm(seq(0,70,.1),mean(na.omit(rwb$post.f.crwn.length)),sqrt(var(na.omit(rwb$post.f.crwn.length

 However, the height of the two curves are about 1/3 to 1/4 the
 height that
 they should be compared to the histogram. Any ideas?


 Yes.  Read the help on hist!  (Hint:  Pay particular attention
 to the
 freq and/or probability arguments.)

cheers,

Rolf Turner




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting the dimnames of an array with variable dimensions

2011-05-15 Thread Pierre Roudier
Hi Dennis,

Thanks for your answer, it works very well - clever way to sort the problem!

Cheers,

Pierre

2011/5/16 Dennis Murphy djmu...@gmail.com:
 Hi:

 Does it have to be an array?  If all you're interested in is the
 dimnames, how about this?

 library(plyr)
 foo - array(data = rnorm(32), dim = c(4,4,2),
                   dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))
 foo
 , , e

           A          B          C          D
 a -0.2183877 -0.8912908 -2.0175612 -0.8080548
 b  0.4870784 -0.8626293 -0.5641368 -0.5219722
 c  0.8821044  0.3187850  1.2203297 -0.3151186
 d -0.9894656 -1.1779108  0.9853935  0.3560747

 , , f

           A          B          C          D
 a  0.7357773 -1.7591637  1.6320887  1.2248529
 b  0.4662315  0.1131432 -0.9790887 -0.6575306
 c -0.3564725 -0.9202688  0.1017894  0.7382683
 d  0.2825117  0.9242299  0.3577063 -1.3297339

 # flatten array into a data frame with dimnames as factors
 # adply() converts an array to a data frame, applying a function
 # along the stated dimensions
  u - adply(foo, c(1, 2, 3), as.vector)
 subset(u, V1  0)[, 1:3]
   X1 X2 X3
 2   b  A  e
 3   c  A  e
 7   c  B  e
 11  c  C  e
 12  d  C  e
 16  d  D  e
 17  a  A  f
 18  b  A  f
 20  d  A  f
 22  b  B  f
 24  d  B  f
 25  a  C  f
 27  c  C  f
 28  d  C  f
 29  a  D  f
 31  c  D  f

 HTH,
 Dennis

 On Sun, May 15, 2011 at 9:20 PM, Pierre Roudier
 pierre.roud...@gmail.com wrote:
 Hi list,

 In a function I am writing, I need to extract the dimension names of
 an array. I know this can be acheived easily using dimnames() but my
 problem is that I want my function to be robust when the number of
 dimensions varies. Consider the following case:

 foo - array(data = rnorm(32), dim = c(4,4,2),
 dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))

 # What I want is to extract the *names of the dimensions* for which
 foo have positive values:
 ind - which(foo  0, arr.ind = TRUE)

 # A first solution is:
 t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3],
 drop=FALSE]
 # But it does require to know the dimensions of foo

 I would like to do something like:

 ind - which(foo  0, arr.ind = TRUE)
 t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE]

 but in that case the dimnames are dropped.

 Any suggestion?

 Cheers,

 Pierre

 --
 Scientist
 Landcare Research, New Zealand

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Scientist
Landcare Research, New Zealand

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.