[R] Fwd: When modeling with negbin from the aod package...

2009-10-21 Thread alexander russell
-- Forwarded message --
From: alexander russell ssv...@gmail.com
Date: Tue, Oct 20, 2009 at 4:34 PM
Subject: Re: [R] When modeling with negbin from the aod package...
To: Matthieu Lesnoff matthieu.lesn...@gmail.com


Hello again,

It seems that, though we have a simple estimate of the variance, phi, with
negbin, some models seek to create a formula for the variance.

For example, I think Bolker has modeled  variance in the Lily_sum data set
as

*nlikfun = function(a, c, d) {*

*+ k = c * exp(d * flowers)*

*+ -sum(dnbinom(seedlings, mu = a, size = k, log = TRUE))*

*+ }*

(book, p. 420)

if I'm correct?
Is it fair to say that the 'random' argument for negbin will model variance
this way only if the independent variable on which the variance depends in
the right-hand formula is categorical or is a factor with a few levels
only?
regards, s
  On Thu, Oct 15, 2009 at 5:53 PM, Matthieu Lesnoff 
matthieu.lesn...@gmail.com wrote:

 that is they are
  maximum likelihood estimates. Would someone tell me if I'm correct?

 yes, all parameters estimated with negbin (aod) are ML estimates

 Regards

 Matthieu



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rbind with different columns

2009-10-21 Thread Antje

Karl Ove Hufthammer wrote:
In article 4addc1d0.2040...@yahoo.de, niederlein-rs...@yahoo.de 
says...
In every list entry is a data.frame but the columns are only partially 
the same. If I have exactly the same columns, I could do the following 
command to combine my data:


do.call(rbind, myList)

but of course it does not work with differnt column names. Is there any 
easy way to retrieve a combined table like this:


You're in luck. 'rbind.fill' in the 'plyr' package does exactly what you 
want.





Thanks a lot for your hint. I'll give it a look :-)
So far, I have solved it with basic tools like this (it's less work than 
I thought):



availableColumns - unique(unlist(sapply(myList, names)))

myList - lapply(myList, FUN = function(entry) {
	missingColumns - availableColumns[which(!availableColumns %in% 
names(entry))]

entry[, missingColumns] - NA
entry
})

do.call(rbind, myList)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] re ferring to data of previous rows

2009-10-21 Thread clion

Dear Rlers,
in the following dataset I would like to insert a new column that refers to
the data of the previous row. 
My question is whether the probability of a female (Id) changes if she had
given birth to a pup in the previous year. So my dataframe consists of the
column Id, year (2003-2007 for each Id) and offspring (=of; 1-0):
Id   year   of  
1   2003   1  
1   2004   0  
1   2005   1  
1   2006   1  
1   2007   0  with 1= female (Id) gave birth to offspring (same year)
2   2003   0  and 0 = female didn't pup
2   2004   1  
2   2005   1  
2   2006   0  
2   2007   1  
3   2003   1  
3   2004   1  
3   2005   1  
3   2006   0  
3   2007   0  
Now I want to add the column offspring in the previous year - yes/no(1-0)
the output now should look like this:

Id  year   of   of_inPreviousYear
1   2003   1   NA
1   2004   0   1
1   2005   1   0
1   2006   1   1
1   2007   0   1
2   2003   0   NA
2   2004   1   0
2   2005   1   1
2   2006   0   1
2   2007   1   0
3   2003   1   NA
3   2004   1   1
3   2005   1   1
3   2006   0   1
3   2007   0   0

any idea how I could program that?

thanks a lot,
clion
-- 
View this message in context: 
http://www.nabble.com/referring-to-data-of-previous-rows-tp25987364p25987364.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] editors for R

2009-10-21 Thread Tobias Verbeke

Hi Mark,


After reviewing the IDE/Script Editors article at sciviews.org, I
wanted to pose a quick question here to see if anyone can offer an
opinion or commentary about GUI editors that can be installed in a
Windoze environment that allow editing/saving of remote .R files and
running R programs from within a shell that is housed in the editor
(although R itself is installed in a Linux environment).  Windoze
would essentially be providing a GUI-based tool to edit, save, and
execute without making the user copy files back and forth and switch
between various programs to execute their routines.

Thus far, BlueFish seems to come closest to this; but other
recommendations will be most appreciated.


Eclipse + StatET; see

http://lists.r-forge.r-project.org/pipermail/statet-user/2009-September/000208.html

for information on the latest release. For remote connections, you need
to install RJ-Server (on the Linux machine) and configure a Remote R
Console (on the client).

There is a dedicated mailing list at
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/statet-user

HTH,
Tobias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem using the source-function within R-functions

2009-10-21 Thread Johan Lassen
Hi Giovanni,

Thanks for your reply. I can make the function work after parsing the code
directly into R. The problem arise after compiling the function into a
package and then calling the function, because the files inside source()
seems to be missing.

I tried to include the sourced files in the argument code_files of
the function package.skeleton. The files are brought correctly to the
package but when running the generated package then the files produce an
error due to variables that are not defined.

There may be no way around other than copying the content of the sourced
files into the file where the function is defined?
- in this way the definition of the function is all written in one file, but
the organization of the code-sections is worse than when using the
source()-function...

Best regards,
Johan


2009/10/20 Giovanni Petris gpet...@uark.edu


 The problem probably lies in the source-ing part: look at

 getwd()
 setwd()

 HTH,
 Giovanni

  Date: Tue, 20 Oct 2009 13:00:02 +0200
  From: Johan Lassen jle...@gmail.com
  Sender: r-help-boun...@r-project.org
  Precedence: list
  DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com;
 s=gamma;
  DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
 
  --===0554064772==
  Content-Type: text/plain
  Content-Disposition: inline
  Content-Transfer-Encoding: quoted-printable
  Content-length: 1477
 
  Dear R community,
 
  You may have the solution to how to construct a function using the
 function
  source() to build the function; i.e.
 
  myfunction - function(...){
  source('file1.r')
  source('file2.r')
  }
 
  After compiling and installing the myfunction in R, then calling the
  myfunction gives an error because the content of 'file1.r' and
 'file2.r'
  seems to be missing.
 
  Anyone has the trick to overcome this problem?
 
  Thanks in advance!
 
  best wishes, Johan
 
 
  PS: My function is:
 
 
  run_accumm_value - function(ind_noder_0,
  ind_loc_val,ind_retention,downstream){
  ## Preprocessing of looping calculations:
  koersel_uden_ret - length(unique(ind_noder_0$oplid))
  opsaml_b_0_2 - numeric(koersel_uden_ret)
  opsaml_b_0_2_1 - numeric(koersel_uden_ret)
  opsaml_b_0_2_2 - seq(1:koersel_uden_ret)
  ## Preprocessing of topology and local values to be summed:
  source('preproces_topology.r', local =3D T)
  source('preproces_loc_val.r', local =3D T)
  # Loop for each grouping factor (column in ind_noder_0: oplid):
  for(j in 1:koersel_uden_ret){
  source('matrix_0.r', local =3D T)
  source('matrix.r', local =3D T)
  source('local_value.r', local =3D T)
  source('fordeling.r', local =3D T)
  source('fordeling_manuel.r', local =3D T)
  source('local_ret.r', local =3D T)
  source('Ax=3Db.r', local =3D T)
  source('opsamling_x_0_acc.r', local =3D T)
  }
  source('opsamling_b_1.r', local =3D T)
  opsaml_b_2
  }
 
 
 
 
 
  --=20
  Johan Lassen
  Environment Center Nyk=F8bing F
  Denmark
 
[[alternative HTML version deleted]]
 
 
  --===0554064772==
  Content-Type: text/plain; charset=us-ascii
  MIME-Version: 1.0
  Content-Transfer-Encoding: 7bit
  Content-Disposition: inline
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  --===0554064772==--
 
 

 --

 Giovanni Petris  gpet...@uark.edu
 Associate Professor
 Department of Mathematical Sciences
 University of Arkansas - Fayetteville, AR 72701
 Ph: (479) 575-6324, 575-8630 (fax)
 http://definetti.uark.edu/~gpetris/




-- 
Johan Lassen
Environment Center Nykøbing F
Denmark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] slope calculation

2009-10-21 Thread ms.com

Dear all

I am new R user, and trying to learn more.

I am doing linear regression analysis in R with my data. I am trying to find 
the way to calculate the slope value (coefficient of x) to degree of slope. 
Please give me idea on this.

Thanking you in anticipation

Warm regardMS

  
_


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with randomSurvivalForest

2009-10-21 Thread giovanni parrinello
 summary(ma.dati2$death.status)- censoring
Min.  1st Qu.   Median Mean  3rd Qu. Max. NA's
 0.0  0.0  0.0  0.05332  0.0  1.0 39.0
##
 summary(ma.dati2$time.death)--- time
   Min. 1st Qu.  MedianMean 3rd Qu.Max.NA's
  1 370 370 356 370 370  39

formula
timi.out - rsf(Survrsf(time.death,death.status)~sex,
ma.dati2.rid,
   ntree = 1000,na.action='na.omit')
Errore in rsf.default(Survrsf(time.death, death.status) ~ sex,
ma.dati2.rid[,  :
  Outcome is not a random survival object.  Use 'Survrsf' for the formula.
But time.death is strictly positivi and death.status assume only two
value:,0,1.
Where is the error???
TIA
Giovanni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Solved problems with randomSurvivalForest

2009-10-21 Thread giovanni parrinello
Only a confusion with two dataset with similar names...
Sorry
Giovanni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a legend that automatically reads the values from two vectors?

2009-10-21 Thread Jim Lemon

On 10/21/2009 01:30 AM, jcano wrote:

betav-c(0.78,0.94,0.88,0.41,0.59,4.68)
etav-c(235.6,59.5,31.2,8.7,3.2,1174)
   

Hi Javier,
Maybe not exactly what you want, but try:

addtable2plot(2,8,rbind(betav,etav),bty=o,
 display.colnames=FALSE,display.rownames=TRUE)

using your own x and y coordinates, of course. addtable2plot is in the 
plotrix package.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] three related time series with different resolutions

2009-10-21 Thread William Simpson
I have three time series, x, y, and z, and I want to analyse the
relations between them. However, they have vastly different
resolutions. I am writing to ask for advice on how to handle this
situation in R.

x is a stimulus, and y and z are responses.

x is a rectangular pulse 4 sec long. Its onset and offset are known
with sub-millisecond precision. The onset varies irregularly -- it
doesn't fall on neat 1/2 sec or sec boundaries for example.

y is a sampled continuous waveform. It is highly noisy, and it is
actually well represented by samples 1/2 sec apart or so.

z is a very short pulse -- perhaps 5 ms long -- occurring at irregular times.

I have problems with how to represent these waveforms at the input
stage (during data collection) and at the analysis stage. If I want to
represent x and z with the sort of precision I'd like, I'd have to
sample every ms or so. But:
- that is massive overkill for y, because it is noisy. I will have
maybe 500 times as many pts as I require! That makes for large data
files, for no good reason.
- the representations for x and z are about 99% zeros. Again, wasteful
for storage
- the analysis will be awkward and slow because of the huge number of pts

If I sample at a rate of every 1/2 sec, z may be not detected at all,
and the edges of x are represented very poorly.

I could just save y as a separate file, with values sampled every 1/2
sec, save x as a file containing onset and offset times, and save z as
the times of each impulse. All three files have different lengths. If
I save this way it is awkward keeping track of the three files for
each run, and I'm still left with the problem of analysing them. In my
mind they are actually three continuous waveforms, and my
simple-minded way of analysing would be to create digitised waveform
versions of all three (on some fine grain). If I do that I still have
the problem of zillions of useless points clogging RAM.

What do I want from the analysis?
I want the average waveform for y that occurs for 30 sec after each x
pulse, and the average waveform for z that occurs for 30 sec after
each x pulse. The average waveform for y surrounding each z (maybe 30
sec before and 10 sec after) would also be useful.

Thanks very much for any suggestions about representation for data
collection and analysis, and for data analysis methods (which will
depend on the data representation).

Bill

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Putting names on a ggplot - fortune candidate?

2009-10-21 Thread Jim Lemon

On 10/21/2009 01:59 AM, hadley wickham wrote:


It currently works (because I can't figure out how to make it an
error) but you really should not do it.
   

Please pardon my nomination, Hadley, but that is just too good to pass up.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] linear regression: Is there a way to get the results of lm as variables

2009-10-21 Thread CE.KA

Hi R users

I used R to get the results of a linear regression

reg-lm(y~x)

here are the results:

# Call:
# lm(formula = donnees$txi7098 ~ donnees$txs7098)
# 
# Residuals:
   
#   Min1QMedian3Q   Max 
   
# -0.037971 -0.013373 -0.004947  0.010618  0.053235 
  
# 
# Coefficients:
# Estimate Std. Error t value Pr(|t|)
# (Intercept)  0.085470.03028   2.822 0.011738 *  
# donnees$txs7098  0.623060.12847   4.850 0.000150 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
#   
 
# Residual standard error: 0.02199 on 17 degrees of freedom 
  
# Multiple R-squared: 0.5805, Adjusted R-squared: 0.5558
  
# F-statistic: 23.52 on 1 and 17 DF,  p-value: 0.0001502 

I know how to get  the coefficients as variable
for exemple: 
reg$coefficients[1]=0.08547 (Estimate )
reg$coefficients[2]=0.62306 (Estimate )

My question is:
Is there a way to get the other results of lm as variables?
-Std. Errors?
-t value?
-Pr(|t|) ?   
-Residual standard error?
-Multiple R-squared?
-F-statistic?

Best regards 

 
-- 
View this message in context: 
http://www.nabble.com/linear-regression%3A-Is-there-a-way-to-get-the-results-of-lm-as-variables-tp25988916p25988916.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transparent Bands in R

2009-10-21 Thread Jim Lemon

On 10/21/2009 09:40 AM, mnstn wrote:

Hello All,

My question is regarding the attached plot. I would like to have multiple
transparent green bands running the length (yaxis) of the plot the width of
which is determined by the green lines at y=0 in the plot. Can you suggest a
way to do it?
 

Hi mnstn,
Transparency is only supported by some graphic devices. It may be 
simpler to plot the values, use rect to draw green rectangles:


xylim-par(usr)
rect(start1,xylim[3],stop2,xylim[4],col=green)
rect(start2,...

then redraw the lines over the rectangles using lines.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] three related time series with different resolutions

2009-10-21 Thread William Simpson
PS I think one way to get the average waveforms I want from the
analysis is using cross-correlation, but again the multiple scale
problem is present.

On Wed, Oct 21, 2009 at 9:56 AM, William Simpson
william.a.simp...@gmail.com wrote:
 I have three time series, x, y, and z, and I want to analyse the
 relations between them. However, they have vastly different
 resolutions. I am writing to ask for advice on how to handle this
 situation in R.

 x is a stimulus, and y and z are responses.

 x is a rectangular pulse 4 sec long. Its onset and offset are known
 with sub-millisecond precision. The onset varies irregularly -- it
 doesn't fall on neat 1/2 sec or sec boundaries for example.

 y is a sampled continuous waveform. It is highly noisy, and it is
 actually well represented by samples 1/2 sec apart or so.

 z is a very short pulse -- perhaps 5 ms long -- occurring at irregular times.

 I have problems with how to represent these waveforms at the input
 stage (during data collection) and at the analysis stage. If I want to
 represent x and z with the sort of precision I'd like, I'd have to
 sample every ms or so. But:
 - that is massive overkill for y, because it is noisy. I will have
 maybe 500 times as many pts as I require! That makes for large data
 files, for no good reason.
 - the representations for x and z are about 99% zeros. Again, wasteful
 for storage
 - the analysis will be awkward and slow because of the huge number of pts

 If I sample at a rate of every 1/2 sec, z may be not detected at all,
 and the edges of x are represented very poorly.

 I could just save y as a separate file, with values sampled every 1/2
 sec, save x as a file containing onset and offset times, and save z as
 the times of each impulse. All three files have different lengths. If
 I save this way it is awkward keeping track of the three files for
 each run, and I'm still left with the problem of analysing them. In my
 mind they are actually three continuous waveforms, and my
 simple-minded way of analysing would be to create digitised waveform
 versions of all three (on some fine grain). If I do that I still have
 the problem of zillions of useless points clogging RAM.

 What do I want from the analysis?
 I want the average waveform for y that occurs for 30 sec after each x
 pulse, and the average waveform for z that occurs for 30 sec after
 each x pulse. The average waveform for y surrounding each z (maybe 30
 sec before and 10 sec after) would also be useful.

 Thanks very much for any suggestions about representation for data
 collection and analysis, and for data analysis methods (which will
 depend on the data representation).

 Bill


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bootstrapping confidence intervals

2009-10-21 Thread Charlotta Rylander
Hello,
 
We are a group of PhD students working in the field of toxicology. Several
of us have small data sets with N=10-15. Our research is mainly about the
association between an exposure and an effect, so preferrably we would like
to use linear regression models. However, most of the time our data do not
fulfill the model assumptions for linear models ( no normality of y-varible
achieved even after log transformation). We have been told that we can use
bootstrapping to derive a confidence interval for the original parameter
estimate ( Beta 1)  from the linear regression model and if the confidence
interval do not include 0, we can trust the result from the original
linear model ( of couse only if a scatter plot of the variables looks ok).
What is your opinion about this method? Is that ok?  I have problems
understanding how it is possible to resample several times from an already
poor distribution ( that do not fulfill the model assumptions for linear
models) to achieve a confidence interval that validates the use of these
linear models? I would really appriciate a simple explanation about this!
 
Many thanks,
 
Charlotta Rylander

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2: Histogram with negative values on x-axis doesn't work

2009-10-21 Thread sebastian . rohrer
I have a dataset that contains numbers between -10 and 0. E.g. x = 
c(-9.23, -9.56, -1.40, ...)
If I no do a
 qplot(x, geom=histogram)
I get the error:
Error: position_stack requires non-overlapping x intervals
Strangely, the following both work:
 qplot(x * -1, geom=histogram)
 qplot(x+100, geom=histogram)
Has anyone else encountered this? Is this a bug in ggplot2, or do I get 
anything completely wrong here?
Versions: R: 2.9.1, ggplot2: 0.8.3 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problem with pdf in batch mode

2009-10-21 Thread guillaume Le Ray
Hi all R Users,

I am using R in batch mode to do an automatic reporting, I'm saving all the
picture in wmf. When I'm launching manually the batch file(.bat) it's
working but when I'm lauching the batch file from the server I have in the
outputfile the following erro message:

Error in device.call(...) : unable to start device pdf
Calls: total ... trellis.par.get - trellis.device - device.call -
.External
Execution halted

I know that by default in batch mode it's printing the graphs in a pdf file
and I think the error is coming from that but I don't know how to disable
this option.

Thanks


Guillaume Le Ray

Delta, SenseLab

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] eliminate characters of one data.frame col, using another column

2009-10-21 Thread Luis F
Dear Maling list,

I have a data.frame with three columns. I want to produce a fourth
column which is result of eliminating the characters present in the
second and third colum from the first.

Example:
a b   c
1 f f
2  j h
3  gg

I want the result:
ff
hhh


I can get what I want using the code below. But it's slow for big files
(which I have) and most likely there's a better way to do this

1) is there a function that would do the same as FUN.remove2ndCol
2) is there a way to avoid the apply for every row?

Thanks,
Tiago

###

dfIn - data.frame(a=c('', '', ''), b=c('f', 'j', 'g'),
c=c('f', 'h', 'g'))

FUN.remove2ndCol - function(vec){

 vec.sub - sub(vec['b'], '', vec['a'])
 vec.sub - sub(vec['c'], '', vec.sub)

 return(vec.sub)
}

dfIn$Output - apply(dfIn, 1, FUN.remove2ndCol)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Keeping package sources for recompilation with new R version?

2009-10-21 Thread Rainer M Krug
Hi

I am using Ubuntu Hardy, and I installing many packages from source. I am
keeping my R packages fairly up to date.

My question is: is there a way, of keeping the source packages, so that when
I am installing a new version of R, an update.packages(checkBuilt=TRUE) will
only fetch the packages when newer ones are available, but otherwise will
use the local copies?

To rephrase it, is it possible to have some kind of local repository for the
source packages which are installed, so that only packages are downloaded
when newer ones are available? I could use destdir to specify the directory
in which to save the source files of the packages, but how can I use this
directory as a first choice, and only download if a newer version exists?

Rainer

-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology,
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Cell:   +27 - (0)83 9479 042
Fax:+27 - (0)86 516 2782
Fax:+49 - (0)721 151 334 888
email:  rai...@krugs.de

Skype:  RMkrug
Google: r.m.k...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with time series

2009-10-21 Thread ehxpieterse

Hi there,

I am having trouble getting the plotting of multiple time series to work.

I have used RBloomberg to download data, which I then convert to a data
frame. After I have calculated my new index values, I would like to plot the
new index.

My problem is that I can't get the plot feature to show the dates properly.
The resultant data frame DATA has row.names in the first column, which
contains the dates from the RBloomberg download. 

Could one of the R wizards please point me in the right direction?

Thanks in advance

Summary of my code:

conn - blpConnect(iface=COM, timeout=12000,
show.days=week,na.action=previous.days, periodicity=daily)
data - blpGetData(conn, tickers, PX_LAST, start = chron(01/01/00), end
= NULL,barsize = NULL, barfields = NULL, retval = NULL)
blpDisconnect(conn)
DATA - data.frame(data)
.
.
.
A few calculations
.
.
.
Strategy - as.ts(DATA[,5])
Index - as.ts(DATA[,6])

ts.plot(Strategy,Index,col=c(tomato,black))


-- 
View this message in context: 
http://www.nabble.com/Help-with-time-series-tp25990385p25990385.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Tony Breyal
Dear all,

Lets say I have the following data frame:

 set.seed(1)
 col1 - c(rep('happy',9), rep('sad', 9))
 col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
 dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
 score=rnorm(18, 10, 3)
 df1-data.frame(col1=col1, col2=col2, Date=dates, score=score)

col1  col2   Date score
1  happy alpha 2009-10-13  8.120639
2  happy alpha 2009-10-14 10.550930
3  happy alpha 2009-10-15  7.493114
4  happy  beta 2009-10-13 14.785842
5  happy  beta 2009-10-14 10.988523
6  happy  beta 2009-10-15  7.538595
7  happy gamma 2009-10-13 11.462287
8  happy gamma 2009-10-14 12.214974
9  happy gamma 2009-10-15 11.727344
10   sad alpha 2009-10-13  9.083835
11   sad alpha 2009-10-14 14.535344
12   sad alpha 2009-10-15 11.169530
13   sad  beta 2009-10-13  8.136278
14   sad  beta 2009-10-14  3.355900
15   sad  beta 2009-10-15 13.374793
16   sad gamma 2009-10-13  9.865199
17   sad gamma 2009-10-14  9.951429
18   sad gamma 2009-10-15 12.831509


Is it possible to get the following, whereby I am averaging the values
within each group of values in col2:

col1  col2   Date score   Average
1  happy alpha 13/10/2009  8.120639  8.721561
2  happy alpha 14/10/2009 10.550930  8.721561
3  happy alpha 15/10/2009  7.493114  8.721561
4  happy  beta 13/10/2009 14.785842 11.104320
5  happy  beta 14/10/2009 10.988523 11.104320
6  happy  beta 15/10/2009  7.538595 11.104320
7  happy gamma 13/10/2009 11.462287 11.801535
8  happy gamma 14/10/2009 12.214974 11.801535
9  happy gamma 15/10/2009 11.727344 11.801535
10   sad alpha 13/10/2009  9.083835 11.596236
11   sad alpha 14/10/2009 14.535344 11.596236
12   sad alpha 15/10/2009 11.169530 11.596236
13   sad  beta 13/10/2009  8.136278  8.288990
14   sad  beta 14/10/2009  3.355900  8.288990
15   sad  beta 15/10/2009 13.374793  8.288990
16   sad gamma 13/10/2009  9.865199 10.882712
17   sad gamma 14/10/2009  9.951429 10.882712
18   sad gamma 15/10/2009 12.831509 10.882712


My feeling is that I should be using the ?aggregate is some fashion
but can't see how to do it. Or possibly there's another function i
should be using?

Thanks in advance,
Tony

O/S: Windows Vista Ultimate
 sessionInfo()
R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2: Histogram with negative values on x-axis doesn't work

2009-10-21 Thread Karl Ove Hufthammer
In article OF1D427087.01614A71-ONC1257656.003BFAB8-C1257656.003C88F8
@basf-c-s.be, sebastian.roh...@basf.com says...
 I have a dataset that contains numbers between -10 and 0. E.g. x = 
 c(-9.23, -9.56, -1.40, ...)
 If I no do a
  qplot(x, geom=histogram)
 I get the error:
 Error: position_stack requires non-overlapping x intervals
 Strangely, the following both work:
  qplot(x * -1, geom=histogram)
  qplot(x+100, geom=histogram)

I can reproduce it with for example
x=c(-9.23, -9.56, -1.40)

But adding a single positive number, even .001, fixes it, while 
adding a similar negative number introduces a new error message, so it 
really looks like a bug in ggplot2 when all the values are negative.

Report it to the ggplot2 author or the ggplot2 mailing list.

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Benilton Carvalho

aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean)
results = merge(df1, aves)

b

On Oct 21, 2009, at 9:03 AM, Tony Breyal wrote:


Dear all,

Lets say I have the following data frame:


set.seed(1)
col1 - c(rep('happy',9), rep('sad', 9))
col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
score=rnorm(18, 10, 3)
df1-data.frame(col1=col1, col2=col2, Date=dates, score=score)


   col1  col2   Date score
1  happy alpha 2009-10-13  8.120639
2  happy alpha 2009-10-14 10.550930
3  happy alpha 2009-10-15  7.493114
4  happy  beta 2009-10-13 14.785842
5  happy  beta 2009-10-14 10.988523
6  happy  beta 2009-10-15  7.538595
7  happy gamma 2009-10-13 11.462287
8  happy gamma 2009-10-14 12.214974
9  happy gamma 2009-10-15 11.727344
10   sad alpha 2009-10-13  9.083835
11   sad alpha 2009-10-14 14.535344
12   sad alpha 2009-10-15 11.169530
13   sad  beta 2009-10-13  8.136278
14   sad  beta 2009-10-14  3.355900
15   sad  beta 2009-10-15 13.374793
16   sad gamma 2009-10-13  9.865199
17   sad gamma 2009-10-14  9.951429
18   sad gamma 2009-10-15 12.831509


Is it possible to get the following, whereby I am averaging the values
within each group of values in col2:

   col1  col2   Date score   Average
1  happy alpha 13/10/2009  8.120639  8.721561
2  happy alpha 14/10/2009 10.550930  8.721561
3  happy alpha 15/10/2009  7.493114  8.721561
4  happy  beta 13/10/2009 14.785842 11.104320
5  happy  beta 14/10/2009 10.988523 11.104320
6  happy  beta 15/10/2009  7.538595 11.104320
7  happy gamma 13/10/2009 11.462287 11.801535
8  happy gamma 14/10/2009 12.214974 11.801535
9  happy gamma 15/10/2009 11.727344 11.801535
10   sad alpha 13/10/2009  9.083835 11.596236
11   sad alpha 14/10/2009 14.535344 11.596236
12   sad alpha 15/10/2009 11.169530 11.596236
13   sad  beta 13/10/2009  8.136278  8.288990
14   sad  beta 14/10/2009  3.355900  8.288990
15   sad  beta 15/10/2009 13.374793  8.288990
16   sad gamma 13/10/2009  9.865199 10.882712
17   sad gamma 14/10/2009  9.951429 10.882712
18   sad gamma 15/10/2009 12.831509 10.882712


My feeling is that I should be using the ?aggregate is some fashion
but can't see how to do it. Or possibly there's another function i
should be using?

Thanks in advance,
Tony

O/S: Windows Vista Ultimate

sessionInfo()

R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United  
Kingdom.

1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods
base

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Chuck Cleland
On 10/21/2009 7:03 AM, Tony Breyal wrote:
 Dear all,
 
 Lets say I have the following data frame:
 
 set.seed(1)
 col1 - c(rep('happy',9), rep('sad', 9))
 col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2)
 dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6))
 score=rnorm(18, 10, 3)
 df1-data.frame(col1=col1, col2=col2, Date=dates, score=score)
 
 col1  col2   Date score
 1  happy alpha 2009-10-13  8.120639
 2  happy alpha 2009-10-14 10.550930
 3  happy alpha 2009-10-15  7.493114
 4  happy  beta 2009-10-13 14.785842
 5  happy  beta 2009-10-14 10.988523
 6  happy  beta 2009-10-15  7.538595
 7  happy gamma 2009-10-13 11.462287
 8  happy gamma 2009-10-14 12.214974
 9  happy gamma 2009-10-15 11.727344
 10   sad alpha 2009-10-13  9.083835
 11   sad alpha 2009-10-14 14.535344
 12   sad alpha 2009-10-15 11.169530
 13   sad  beta 2009-10-13  8.136278
 14   sad  beta 2009-10-14  3.355900
 15   sad  beta 2009-10-15 13.374793
 16   sad gamma 2009-10-13  9.865199
 17   sad gamma 2009-10-14  9.951429
 18   sad gamma 2009-10-15 12.831509
 
 
 Is it possible to get the following, whereby I am averaging the values
 within each group of values in col2:
 
 col1  col2   Date score   Average
 1  happy alpha 13/10/2009  8.120639  8.721561
 2  happy alpha 14/10/2009 10.550930  8.721561
 3  happy alpha 15/10/2009  7.493114  8.721561
 4  happy  beta 13/10/2009 14.785842 11.104320
 5  happy  beta 14/10/2009 10.988523 11.104320
 6  happy  beta 15/10/2009  7.538595 11.104320
 7  happy gamma 13/10/2009 11.462287 11.801535
 8  happy gamma 14/10/2009 12.214974 11.801535
 9  happy gamma 15/10/2009 11.727344 11.801535
 10   sad alpha 13/10/2009  9.083835 11.596236
 11   sad alpha 14/10/2009 14.535344 11.596236
 12   sad alpha 15/10/2009 11.169530 11.596236
 13   sad  beta 13/10/2009  8.136278  8.288990
 14   sad  beta 14/10/2009  3.355900  8.288990
 15   sad  beta 15/10/2009 13.374793  8.288990
 16   sad gamma 13/10/2009  9.865199 10.882712
 17   sad gamma 14/10/2009  9.951429 10.882712
 18   sad gamma 15/10/2009 12.831509 10.882712
 
 
 My feeling is that I should be using the ?aggregate is some fashion
 but can't see how to do it. Or possibly there's another function i
 should be using?

?ave

  For example, try something like this:

transform(df1, Average = ave(score, col1, col2))

 Thanks in advance,
 Tony
 
 O/S: Windows Vista Ultimate
 sessionInfo()
 R version 2.9.2 (2009-08-24)
 i386-pc-mingw32
 
 locale:
 LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
 1252;LC_MONETARY=English_United Kingdom.
 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Karl Ove Hufthammer
In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu, 
bcarv...@jhsph.edu says...
 aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean)
 results = merge(df1, aves)

Or, with the 'plyr' package, which has a very nice syntax:

library(plyr)
ddply(df1, .(col1, col2), transform, Average=mean(score))

It may be a bit slow for very large datasets, though.

Here's an alternative, which will be as fast as the aggregate solution.

within(df1, { Average=ave(score, col1, col2, FUN=mean) } )

Which one you use is a matter of taste.

And of course, the 'within' function is not the important part here; 
'ave' is. For example, if you have attached your data frame, you just 
have to type

Average=ave(score, col1, col2, FUN=mean)

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrapping confidence intervals

2009-10-21 Thread Ista Zahn
John Fox has a nice explanation here:
http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf

-Ista

On Wed, Oct 21, 2009 at 6:38 AM, Charlotta Rylander z...@nilu.no wrote:
 Hello,

 We are a group of PhD students working in the field of toxicology. Several
 of us have small data sets with N=10-15. Our research is mainly about the
 association between an exposure and an effect, so preferrably we would like
 to use linear regression models. However, most of the time our data do not
 fulfill the model assumptions for linear models ( no normality of y-varible
 achieved even after log transformation). We have been told that we can use
 bootstrapping to derive a confidence interval for the original parameter
 estimate ( Beta 1)  from the linear regression model and if the confidence
 interval do not include 0, we can trust the result from the original
 linear model ( of couse only if a scatter plot of the variables looks ok).
 What is your opinion about this method? Is that ok?  I have problems
 understanding how it is possible to resample several times from an already
 poor distribution ( that do not fulfill the model assumptions for linear
 models) to achieve a confidence interval that validates the use of these
 linear models? I would really appriciate a simple explanation about this!

 Many thanks,

 Charlotta Rylander

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SVM probability output variation

2009-10-21 Thread Anders Carlsson

Dear R:ers,

I'm using the svm from the e1071 package to train a model with the 
option probabilities = TRUE. I then use predict with probabilities 
= TRUE and get the probabilities for the data point belonging to either 
class. So far all is well.


My question is why I get different results each time I train the model, 
although I use exactly the same data. The prediction seems to be 
reproducible, but if I re-train the model, the probabilities vary some what.


Here, I have trained a model on exactly the same data five times. When 
predicting using the different models, this is how the probabilities vary:


probabilities
Grp.0Grp.1
0.70771550.2922845
0.79387820.2061218
0.81788330.1821167
0.71222030.2877797

How can the predictions using the same training and test data vary so much?

Thanks,
Anders

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining multiple 3D graphs

2009-10-21 Thread Michael Ralph M. Abrigo
Hi all!
I am a grad stat student and is fairly new in using R. I am doing a
regression tree in one of my problem sets. I have already identified the
cut-points (using Stata) for the regression. My problem is how to graph the
fitted values of z against the independent variables x and y. Basically, I
want to graph the following:
z=87.67if 0x=29  0y=48
z=75if 0x=29  48y=62
z=67if 29x=36  0y=62
z=56.71if 36x=45  0y=62
z=40.8 if 36x=55  0y=62
I have some idea (still to be implemented) on how to graph them piece-wise
using persp. My question is this:how do one combine multiple 3D graphs
(using persp) in R? better yet, is there a way to graph them all in just
one go?
Any help would be welcome. Many thanks in advance.

Cheers,
Mykel

-- 
I am most anxious for liberties for our country... but I place as a prior
condition the education of the people so that our country may have an
individuality of its own and make itself worthy of liberties...  Jose
Rizal,1896

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re ferring to data of previous rows

2009-10-21 Thread Gabor Grothendieck
Iff dd is your data frame then:

dd$prev - ave(dd$of, dd$Id, FUN = function(x) c(NA, head(x, -1)))


On Wed, Oct 21, 2009 at 2:55 AM, clion birt...@hotmail.com wrote:

 Dear Rlers,
 in the following dataset I would like to insert a new column that refers to
 the data of the previous row.
 My question is whether the probability of a female (Id) changes if she had
 given birth to a pup in the previous year. So my dataframe consists of the
 column Id, year (2003-2007 for each Id) and offspring (=of; 1-0):
 Id   year   of
 1   2003   1
 1   2004   0
 1   2005   1
 1   2006   1
 1   2007   0      with 1= female (Id) gave birth to offspring (same year)
 2   2003   0      and 0 = female didn't pup
 2   2004   1
 2   2005   1
 2   2006   0
 2   2007   1
 3   2003   1
 3   2004   1
 3   2005   1
 3   2006   0
 3   2007   0
 Now I want to add the column offspring in the previous year - yes/no(1-0)
 the output now should look like this:

 Id  year   of   of_inPreviousYear
 1   2003   1   NA
 1   2004   0   1
 1   2005   1   0
 1   2006   1   1
 1   2007   0   1
 2   2003   0   NA
 2   2004   1   0
 2   2005   1   1
 2   2006   0   1
 2   2007   1   0
 3   2003   1   NA
 3   2004   1   1
 3   2005   1   1
 3   2006   0   1
 3   2007   0   0

 any idea how I could program that?

 thanks a lot,
 clion
 --
 View this message in context: 
 http://www.nabble.com/referring-to-data-of-previous-rows-tp25987364p25987364.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] slope calculation

2009-10-21 Thread Eik Vettorazzi

Hi MS,
I think it is simple trigonometry:

atan(beta)

or in degrees instead of radians

atan(beta)*360/(2*pi)

hth.

ms.com schrieb:

Dear all

I am new R user, and trying to learn more.

I am doing linear regression analysis in R with my data. I am trying to find 
the way to calculate the slope value (coefficient of x) to degree of slope. 
Please give me idea on this.

Thanking you in anticipation

Warm regardMS

  		 	   		  
_



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  


--
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790



--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf
Körperschaft des öffentlichen Rechts
Gerichtsstand: Hamburg

Vorstandsmitglieder:
Prof. Dr. Jörg F. Debatin (Vorsitzender)
Dr. Alexander Kirstein
Ricarda Klein
Prof. Dr. Dr. Uwe Koch-Gromus
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] three related time series with different resolutions

2009-10-21 Thread William Simpson
 I wasn't clear: x and z are pulse *trains* with irregular gaps between pulses.

 x is a rectangular pulse 4 sec long. Its onset and offset are known
 with sub-millisecond precision. The onset varies irregularly -- it
 doesn't fall on neat 1/2 sec or sec boundaries for example.

 y is a sampled continuous waveform. It is highly noisy, and it is
 actually well represented by samples 1/2 sec apart or so.

 z is a very short pulse -- perhaps 5 ms long -- occurring at irregular times.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem using the source-function within R-functions

2009-10-21 Thread Duncan Murdoch

On 10/21/2009 3:53 AM, Johan Lassen wrote:

Hi Giovanni,

Thanks for your reply. I can make the function work after parsing the code
directly into R. The problem arise after compiling the function into a
package and then calling the function, because the files inside source()
seems to be missing.

I tried to include the sourced files in the argument code_files of
the function package.skeleton. The files are brought correctly to the
package but when running the generated package then the files produce an
error due to variables that are not defined.

There may be no way around other than copying the content of the sourced
files into the file where the function is defined?
- in this way the definition of the function is all written in one file, but
the organization of the code-sections is worse than when using the
source()-function...


R is not a macro language, and doesn't work well as one (though Thomas 
Lumley has written some code to make it act that way; see R-news 
http://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf).


I don't know the details of what you are trying to do, but the best way 
to handle the generic sort of problem you are describing is to take 
those source'd files, and rewrite their content as functions to be 
called from your other functions.  Then their source can still be in 
separate files, with one line in each location where you need to call it.


The difficulty in doing this is that you need to think carefully about 
what arguments those functions should have, and that may need some 
reorganization.  (Macro languages let you refer to every local variable; 
called functions can only see what you pass them.)  However, in general 
this will make your code easier to read and maintain, so the extra work 
is a good investment.


Duncan Murdoch




Best regards,
Johan


2009/10/20 Giovanni Petris gpet...@uark.edu



The problem probably lies in the source-ing part: look at

getwd()
setwd()

HTH,
Giovanni

 Date: Tue, 20 Oct 2009 13:00:02 +0200
 From: Johan Lassen jle...@gmail.com
 Sender: r-help-boun...@r-project.org
 Precedence: list
 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com;
s=gamma;
 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;

 --===0554064772==
 Content-Type: text/plain
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 Content-length: 1477

 Dear R community,

 You may have the solution to how to construct a function using the
function
 source() to build the function; i.e.

 myfunction - function(...){
 source('file1.r')
 source('file2.r')
 }

 After compiling and installing the myfunction in R, then calling the
 myfunction gives an error because the content of 'file1.r' and
'file2.r'
 seems to be missing.

 Anyone has the trick to overcome this problem?

 Thanks in advance!

 best wishes, Johan


 PS: My function is:


 run_accumm_value - function(ind_noder_0,
 ind_loc_val,ind_retention,downstream){
 ## Preprocessing of looping calculations:
 koersel_uden_ret - length(unique(ind_noder_0$oplid))
 opsaml_b_0_2 - numeric(koersel_uden_ret)
 opsaml_b_0_2_1 - numeric(koersel_uden_ret)
 opsaml_b_0_2_2 - seq(1:koersel_uden_ret)
 ## Preprocessing of topology and local values to be summed:
 source('preproces_topology.r', local =3D T)
 source('preproces_loc_val.r', local =3D T)
 # Loop for each grouping factor (column in ind_noder_0: oplid):
 for(j in 1:koersel_uden_ret){
 source('matrix_0.r', local =3D T)
 source('matrix.r', local =3D T)
 source('local_value.r', local =3D T)
 source('fordeling.r', local =3D T)
 source('fordeling_manuel.r', local =3D T)
 source('local_ret.r', local =3D T)
 source('Ax=3Db.r', local =3D T)
 source('opsamling_x_0_acc.r', local =3D T)
 }
 source('opsamling_b_1.r', local =3D T)
 opsaml_b_2
 }





 --=20
 Johan Lassen
 Environment Center Nyk=F8bing F
 Denmark

   [[alternative HTML version deleted]]


 --===0554064772==
 Content-Type: text/plain; charset=us-ascii
 MIME-Version: 1.0
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --===0554064772==--



--

Giovanni Petris  gpet...@uark.edu
Associate Professor
Department of Mathematical Sciences
University of Arkansas - Fayetteville, AR 72701
Ph: (479) 575-6324, 575-8630 (fax)
http://definetti.uark.edu/~gpetris/








__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Re: [R] linear regression: Is there a way to get the results of lm as variables

2009-10-21 Thread Jorge Ivan Velez
Hi CE.KA,

Take a look at the following:

# Data
set.seed(123)
x - rnorm(100)
y - 2 + 1.5*x + rnorm(100)

# Regression model
reg - lm(y ~ x)

# The summary
summary(reg)

# Objects present in the summary()
names(summary(reg))

# Extracting the coefficients
summary(reg)$coeff

HTH,
Jorge


On Wed, Oct 21, 2009 at 5:07 AM, CE.KA  wrote:


 Hi R users

 I used R to get the results of a linear regression

 reg-lm(y~x)

 here are the results:

 # Call:
 # lm(formula = donnees$txi7098 ~ donnees$txs7098)
 #
 # Residuals:
 #   Min1QMedian3Q   Max
 # -0.037971 -0.013373 -0.004947  0.010618  0.053235
 #
 # Coefficients:
 # Estimate Std. Error t value Pr(|t|)
 # (Intercept)  0.085470.03028   2.822 0.011738 *
 # donnees$txs7098  0.623060.12847   4.850 0.000150 ***
 # ---
 # Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 #
 # Residual standard error: 0.02199 on 17 degrees of freedom
 # Multiple R-squared: 0.5805, Adjusted R-squared: 0.5558
 # F-statistic: 23.52 on 1 and 17 DF,  p-value: 0.0001502

 I know how to get  the coefficients as variable
 for exemple:
 reg$coefficients[1]=0.08547 (Estimate )
 reg$coefficients[2]=0.62306 (Estimate )

 My question is:
 Is there a way to get the other results of lm as variables?
 -Std. Errors?
 -t value?
 -Pr(|t|) ?
 -Residual standard error?
 -Multiple R-squared?
 -F-statistic?

 Best regards


 --
 View this message in context:
 http://www.nabble.com/linear-regression%3A-Is-there-a-way-to-get-the-results-of-lm-as-variables-tp25988916p25988916.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))

2009-10-21 Thread Tony Breyal
Thank you all for your responses, i have now achieved the desired
output for my own real data using your suggestions. I will also have
to look into this 'plyr' package as i have noticed that it gets
mentioned a lot.


On 21 Oct, 13:33, Karl Ove Hufthammer k...@huftis.org wrote:
 In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu,
 bcarv...@jhsph.edu says...

  aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean)
  results = merge(df1, aves)

 Or, with the 'plyr' package, which has a very nice syntax:

 library(plyr)
 ddply(df1, .(col1, col2), transform, Average=mean(score))

 It may be a bit slow for very large datasets, though.

 Here's an alternative, which will be as fast as the aggregate solution.

 within(df1, { Average=ave(score, col1, col2, FUN=mean) } )

 Which one you use is a matter of taste.

 And of course, the 'within' function is not the important part here;
 'ave' is. For example, if you have attached your data frame, you just
 have to type

 Average=ave(score, col1, col2, FUN=mean)

 --
 Karl Ove Hufthammer

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining multiple 3D graphs

2009-10-21 Thread Duncan Murdoch

On 10/21/2009 9:03 AM, Michael Ralph M. Abrigo wrote:

Hi all!
I am a grad stat student and is fairly new in using R. I am doing a
regression tree in one of my problem sets. I have already identified the
cut-points (using Stata) for the regression. My problem is how to graph the
fitted values of z against the independent variables x and y. Basically, I
want to graph the following:
z=87.67if 0x=29  0y=48
z=75if 0x=29  48y=62
z=67if 29x=36  0y=62
z=56.71if 36x=45  0y=62
z=40.8 if 36x=55  0y=62
I have some idea (still to be implemented) on how to graph them piece-wise
using persp. My question is this:how do one combine multiple 3D graphs
(using persp) in R? better yet, is there a way to graph them all in just
one go?


I would define a function of x and y which evaluates to your fitted 
values, evaluate it on a grid, and then use persp.  You might already 
have code to do this, or you might have to write it something like this:


f - function(x,y) {
   ifelse( 0  x  x = 29  0  y  y = 48, 87.67,
   ifelse( 0  x  x = 29  y  48  y = 62, 75,

}

etc.  I leave the difficult problem of properly closing all the 
parentheses to you!


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting/modifying a symbolic formula

2009-10-21 Thread Bryan Hanson
Hello All..

Please consider the following:

y - rnorm(20, mean = 10)
f1 - as.factor(rep(c(A, B, B, A), 5))
f2 - as.factor(rep(c(C, D), 10))
testdata - data.frame(y, f1, f2)

testFunc - function(formula, data, ...) {
#mf - model.frame(formula, data)
kw.res - kruskal.test(formula, data)
}

res - testFunc(y ~ f1 * f2, testdata)

Which works perfectly well.  Now, I would like to do some other tests which
allow only one factor, for instance wilcox.test.  So, I would like to modify
my formula so that it reads y ~ f1 or y ~ f2, then conduct more tests.  What
is the smart way to subset or modify an existing symbolic formula?  I have
been staring at the attributes of res and mf (commented out currently) but
these are nested in a way I find difficult to extract.  ?terms,
?terms.formula, ?formula etc all discuss these attributes but as I said, it
seems a bit impenetrable.  RSiteSearch(symbolic formula) returns too many
answers.  No doubt I am missing the obvious, as this is my first foray into
using formulas in my own functions.

TIA, Bryan
*
Bryan Hanson
Acting Chair
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA


 sessionInfo()
R version 2.10.0 RC (2009-10-19 r50172)
i386-apple-darwin8.11.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] datasets  tools grid  utils stats graphics  grDevices
methods  
[9] base 

other attached packages:
 [1] ggplot2_0.8.3  reshape_0.8.3  proto_0.3-8mvbutils_2.2.0
 [5] ChemoSpec_1.2  lattice_0.17-25mvoutlier_1.4  plyr_0.1.8
 [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
robustbase_0.4-5  
[13] rpart_3.1-45   pls_2.1-0  pcaPP_1.7  mvtnorm_0.9-7
[17] nnet_7.2-48mclust_3.2 MASS_7.2-48lars_0.9-7
[21] e1071_1.5-19   class_7.2-48

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] formula and model.frame

2009-10-21 Thread Doran, Harold
Suppose I have the following function

myFun - function(formula, data){
f - formula(formula)
dat - model.frame(f, data)
dat
}

Applying it with this sample data yields a new dataframe:

qqq - data.frame(grade = c(3, NA, 3,4,5,5,4,3), score = rnorm(8), idVar = 
c(1:8))

dat - myFun(score ~ grade, qqq)

However, what I would like is for the resulting dataframe (dat) to include as a 
column idVar. Naively, I could do

dat - myFun(score ~ grade + idVar, qqq)

This gives what I'm after in terms of the resulting data, but doesn't make 
sense within the context of the model I am working on since idVar is not one of 
the conditioning variables used, it has a different purpose altogether. So, I 
was thinking another way is to attach it somehow afterwards. Something like:

myFun - function(formula, data, id, na.action){
f - formula(formula)
idVar - data[, id]
dat - model.frame(f, data, na.action = na.action)
dat[, id] - idVar
dat
}

myFun(score ~ grade, qqq, id = 'idVar', na.action = NULL)

Of course, I intentionally use na.action = NULL here because the following 
occurs, of course

 myFun(score ~ grade, qqq, id = 'idVar', na.action = na.omit)
Error in `[-.data.frame`(`*tmp*`, , id, value = 1:8) : 
  replacement has 8 rows, data has 7

I see a few workarounds, but I am certain there is a cleaner way to accomplish 
this. 

Harold


 sessionInfo()
R version 2.9.0 (2009-04-17) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods   base  
   

other attached packages:
[1] lme4_0.999375-28   Matrix_0.999375-25 lattice_0.17-22xtable_1.5-5   
adapt_1.0-4MiscPsycho_1.4
[7] statmod_1.3.8 

loaded via a namespace (and not attached):
[1] grid_2.9.0  tools_2.9.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting/modifying a symbolic formula

2009-10-21 Thread Gabor Grothendieck
Here is one way:

 fo - y ~ f1 * f2
 one.x - lapply(all.vars(fo[[3]]), function(x) { fo[[3]] - as.name(x); fo })
 one.x
[[1]]
y ~ f1

[[2]]
y ~ f2


On Wed, Oct 21, 2009 at 11:29 AM, Bryan Hanson han...@depauw.edu wrote:
 Hello All..

 Please consider the following:

 y - rnorm(20, mean = 10)
 f1 - as.factor(rep(c(A, B, B, A), 5))
 f2 - as.factor(rep(c(C, D), 10))
 testdata - data.frame(y, f1, f2)

 testFunc - function(formula, data, ...) {
 #    mf - model.frame(formula, data)
    kw.res - kruskal.test(formula, data)
    }

 res - testFunc(y ~ f1 * f2, testdata)

 Which works perfectly well.  Now, I would like to do some other tests which
 allow only one factor, for instance wilcox.test.  So, I would like to modify
 my formula so that it reads y ~ f1 or y ~ f2, then conduct more tests.  What
 is the smart way to subset or modify an existing symbolic formula?  I have
 been staring at the attributes of res and mf (commented out currently) but
 these are nested in a way I find difficult to extract.  ?terms,
 ?terms.formula, ?formula etc all discuss these attributes but as I said, it
 seems a bit impenetrable.  RSiteSearch(symbolic formula) returns too many
 answers.  No doubt I am missing the obvious, as this is my first foray into
 using formulas in my own functions.

 TIA, Bryan
 *
 Bryan Hanson
 Acting Chair
 Professor of Chemistry  Biochemistry
 DePauw University, Greencastle IN USA


 sessionInfo()
 R version 2.10.0 RC (2009-10-19 r50172)
 i386-apple-darwin8.11.1

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] datasets  tools     grid      utils     stats     graphics  grDevices
 methods
 [9] base

 other attached packages:
  [1] ggplot2_0.8.3      reshape_0.8.3      proto_0.3-8        mvbutils_2.2.0
  [5] ChemoSpec_1.2      lattice_0.17-25    mvoutlier_1.4      plyr_0.1.8
  [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
 robustbase_0.4-5
 [13] rpart_3.1-45       pls_2.1-0          pcaPP_1.7          mvtnorm_0.9-7
 [17] nnet_7.2-48        mclust_3.2         MASS_7.2-48        lars_0.9-7
 [21] e1071_1.5-19       class_7.2-48

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Temperature Prediction Model

2009-10-21 Thread Aneeta

Greetings!

As part of my research project I am using R to study temperature data
collected by a network. Each node (observation point) records temperature of
its surroundings throughout the day and generates a dataset. Using the
recorded datasets for the past 7 days I need to build a prediction model for
each node that would enable it to check the observed data against the
predicted data. How can I derive an equation for temperature using the
datasets?
The following is a subset of one of the datasets:-

  Time  Temperature 

07:00:17.369668   17.509
07:03:17.465725   17.509 
07:04:17.597071   17.509 
07:05:17.330544   17.509 
07:10:47.838123   17.5482
07:14:16.680696   17.5874
07:16:46.67457 17.5972 
07:29:16.887654   17.7442
07:29:46.705759   17.754 
07:32:17.131713   17.7932
07:35:47.113953   17.8324
07:36:17.194981   17.8324
07:37:17.227013   17.852 
07:38:17.809174   17.8618
07:38:48.00011 17.852 
07:39:17.124362   17.8618 
07:41:17.130624   17.8912 
07:41:46.966421   17.901 
07:43:47.524823   17.95 
07:44:47.430977   17.95 
07:45:16.813396   17.95 

So far I have tried to use linear model fit but have not found it to be
useful. You may look at the attached graph for further reference. 
http://www.nabble.com/file/p25995874/Regression%2BModel%2Bfor%2BNode%2B1%2BDay%2B1.png
Regression+Model+for+Node+1+Day+1.png 
I would really appreciate if you could suggest the correct approach to
building such a prediction model.

Many thanks,
Aneeta



 
-- 
View this message in context: 
http://www.nabble.com/Temperature-Prediction-Model-tp25995874p25995874.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Steve Lianoglou

Hi Anders,

On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote:


Dear R:ers,

I'm using the svm from the e1071 package to train a model with the  
option probabilities = TRUE. I then use predict with  
probabilities = TRUE and get the probabilities for the data point  
belonging to either class. So far all is well.


My question is why I get different results each time I train the  
model, although I use exactly the same data. The prediction seems to  
be reproducible, but if I re-train the model, the probabilities vary  
some what.


Here, I have trained a model on exactly the same data five times.  
When predicting using the different models, this is how the  
probabilities vary:


I'm not sure I'm following the example your giving and the scenario  
you are describing.



probabilities
Grp.0Grp.1
0.70771550.2922845
0.79387820.2061218
0.81788330.1821167
0.71222030.2877797


This seems fine to me: it looks like the probabilities of class  
membership for 4 examples (Note that Grp.0 + Grp.1 = 1).


How can the predictions using the same training and test data vary  
so much?


I'm trying the code below several times (taken from the example), and  
the probabilities calculated from the call to prediction don't change  
much at all:


R data(iris)
R attach(iris)

R model - svm(x, y, probability=TRUE)
R predict(model, x, probability=TRUE)

To be fair, the probabilities aren't exactly the same, but the  
difference between two runs is really small:


R model - svm(x, y, probability=TRUE)
R a - predict(model, x, probability=TRUE)

R model - svm(x, y, probability=TRUE)
R b - predict(model, x, probability=TRUE)

R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities')))
[1] 0.003215959

Is this what you were talking about, or ... ?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] date conversion not as i would have expected

2009-10-21 Thread clair.crossup...@googlemail.com
Good day, i imported some data into R from Excel.  By using the edit()
function, this is what one of the dates looks like in R:

 x - structure(1254351600, class = c(POSIXt, POSIXct), tzone = )
[1] 2009-10-01 BST

However, when i do the following, the date changes:

 as.Date(x, formate=%Y-%m-%d )
[1] 2009-09-30

I don't understand why this is happening. I realise that i can get
around this by doing as.Date(as.character(x)), but would be nice to
understand why it doesn't work directly.

C xx

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] squared euclidean distance

2009-10-21 Thread Caro B.
Dear R-Help-Team,


I would like to cluster my data using the ward-method. In several papers I
read (e.g. Bahrenberg) that it is neccesary to use the squared euclidean
distance with the ward-method. Unfortunatelly I cannot find this term in r
as a method for measuring the distance.

Does anybody have an idea?

Thanks in advance,
Carolin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transparent Bands in R

2009-10-21 Thread mnstn

Hello Megha and Jim,
Thanks for your comments. The green bands actually correspond to data that
looks like:
13 0
42 0
183 0
186 0
187 0
192 0
194 0
and so on. I plotted them using:
plot(v[,1],3+v[,2],type=h,col=gray,lwd=5) and the rest of the data using
points. The result is here:
http://www.twitpic.com/md2li
Thanks!


Jim Lemon-2 wrote:
 
 On 10/21/2009 09:40 AM, mnstn wrote:
 Hello All,
 My question is regarding the attached plot. I would like to have
 multiple
 transparent green bands running the length (yaxis) of the plot the width
 of
 which is determined by the green lines at y=0 in the plot. Can you
 suggest a
 way to do it?
  
 Hi mnstn,
 Transparency is only supported by some graphic devices. It may be 
 simpler to plot the values, use rect to draw green rectangles:
 
 xylim-par(usr)
 rect(start1,xylim[3],stop2,xylim[4],col=green)
 rect(start2,...
 
 then redraw the lines over the rectangles using lines.
 
 Jim
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Transparent-Bands-in-R-tp25983169p25995374.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Temperature Prediction Model

2009-10-21 Thread Steve Lianoglou

Hi,

On Oct 21, 2009, at 12:31 PM, Aneeta wrote:



Greetings!

As part of my research project I am using R to study temperature data
collected by a network. Each node (observation point) records  
temperature of

its surroundings throughout the day and generates a dataset. Using the
recorded datasets for the past 7 days I need to build a prediction  
model for

each node that would enable it to check the observed data against the
predicted data. How can I derive an equation for temperature using the
datasets?
The following is a subset of one of the datasets:-

 Time  Temperature

07:00:17.369668   17.509
07:03:17.465725   17.509
07:04:17.597071   17.509
07:05:17.330544   17.509
07:10:47.838123   17.5482
07:14:16.680696   17.5874
07:16:46.67457 17.5972
07:29:16.887654   17.7442
07:29:46.705759   17.754
07:32:17.131713   17.7932
07:35:47.113953   17.8324
07:36:17.194981   17.8324
07:37:17.227013   17.852
07:38:17.809174   17.8618
07:38:48.00011 17.852
07:39:17.124362   17.8618
07:41:17.130624   17.8912
07:41:46.966421   17.901
07:43:47.524823   17.95
07:44:47.430977   17.95
07:45:16.813396   17.95


I think you/we need much more information.

Are you really trying to build a model that predicts the temperature  
just given the time of day?


Given that you're in NY, I'd say 12pm in August sure feels much  
different than 12pm in February, no?


Or are you trying to predict what one sensor readout would be at a  
particular time given readings from other sensors at the same time?


Or ... ?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Anders Carlsson
Hi again, and thank you Steve for your reply!


 Hi Anders,
 
 On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote:
 
  Dear R:ers,
 
  I'm using the svm from the e1071 package to train a model with the
  option probabilities = TRUE. I then use predict with
  probabilities = TRUE and get the probabilities for the data point
  belonging to either class. So far all is well.
 
  My question is why I get different results each time I train the
  model, although I use exactly the same data. The prediction seems to
  be reproducible, but if I re-train the model, the probabilities vary
  some what.
 
  Here, I have trained a model on exactly the same data five times.
  When predicting using the different models, this is how the
  probabilities vary:
 
 I'm not sure I'm following the example your giving and the scenario
 you are describing.

I think you got it!

 
  probabilities
  Grp.0Grp.1
  0.70771550.2922845
  0.79387820.2061218
  0.81788330.1821167
  0.71222030.2877797
 
 This seems fine to me: it looks like the probabilities of class
 membership for 4 examples (Note that Grp.0 + Grp.1 = 1).
 

Yes, within each run all was OK, but I was surprised that it varied to such a 
high degree.

 
  How can the predictions using the same training and test data vary
  so much?
 
 I'm trying the code below several times (taken from the example), and
 the probabilities calculated from the call to prediction don't change
 much at all:
 
 R data(iris)
 R attach(iris)
 
 R model - svm(x, y, probability=TRUE)
 R predict(model, x, probability=TRUE)
 
 To be fair, the probabilities aren't exactly the same, but the
 difference between two runs is really small:
 
 R model - svm(x, y, probability=TRUE)
 R a - predict(model, x, probability=TRUE)
 
 R model - svm(x, y, probability=TRUE)
 R b - predict(model, x, probability=TRUE)
 
 R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities')))
 [1] 0.003215959
 
 Is this what you were talking about, or ... ?


Yes, exactly that. In your example, though, the variation seems to be a lot 
smaller. I'm guessing that has to with the data. 

If I instead output the decision values, the whole procedure is fully 
reproducible, i.e. the exact same values are returned when I retrain the model. 

I have no idea how the probabilities are calculated, but it seems to be in this 
step that the differences arise. In my case, I feel a bit hesitant to use them 
when they differ that much between runs (15% or so)... 

If important, I use a linear kernel and don't tune the model in any way.


Thank's again!

/Anders
 

 
 -steve
 
 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
|  Memorial Sloan-Kettering Cancer Center
|  Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on mixed effect models with LME

2009-10-21 Thread Peter Flom
Good afternoon

Using R 2.9.2 on a machine running Windows XP

I have a longitudinal data set, with data on schools and their test scores over 
a four year period.  I have centered year, and run the following


m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
   data = long,
   random = ~I(year-2007.5)|schoolnum,
   na.action = na.omit)

where math_1 is a percentage of students in a given school that are at the 
lowest math achievement level, year is year, TFC_ is a categorical variable for 
a treatment I wish to evaluate, and schoolnum is an identifier.

When I run summary on this model, I get a strong negative correlation (-0.91) 
between the intercept and I(year-2007.5), despite the fact that the mean of 
year is 2007.5.  

I am puzzled, as I thought centering the time variable should eliminate, or at 
least strongly reduce, this correlation.

Any insights appreciated

thanks

Peter


Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to find the interception point of two linear fitted model in R?

2009-10-21 Thread FMH
Dear All,

Let have 10 pair of observations, as shown below.

##
x - 1:10
y - c(1,3,2,4,5,10,13,15,19,22)
plot(x,y)
##


Two fitted  models, with ranges of [1,5] and [5,10], can be easily fitted 
separately by lm function as shown below:

###
mod1 - lm(y[1:5] ~ x[1:5])
mod2 - lm(y[5:10] ~ x[5:10])
###

I'm looking for the interception points between these two straight lines from 
these fitted models, mod1 and mod2. 

Could someone advice me the way to do it?

Thank you
Fir 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Missing data and LME models and diagnostic plots

2009-10-21 Thread Peter Flom
Hello

Running R2.9.2 on Windows XP

I am puzzled by the performance of LME in situations where there are missing 
data.  As I understand it, one of the strengths of this sort of model is how 
well it deals with missing data, yet lme requires nonmissing data.  

Thus, 

m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
data = long,
random = ~I(year-2007.5)|schoolnum)

causes an error in na.fail.default, but adding na.action = na.omit makes a 
model with no errors.  However, if I create that model, i.e.

m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
   data = long, 
   random = ~I(year-2007.5)|schoolnum,
   na.action = na.omit)

then the diagnostic plots suggested in Pinheiro  Bates produce errors; e.g.

plot(m1.mod1, schoolnum~resid(.), abline = 0)

gives an error could not find function NaAct.


Searching the archives showed a similar question from 2007, but did not show 
any responses.

Thanks for any help

Peter

)

Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to find the interception point of two linear fitted model in R?

2009-10-21 Thread Paul Johnson
On Wed, Oct 21, 2009 at 12:09 PM, FMH kagba2...@yahoo.com wrote:
 Dear All,

 Let have 10 pair of observations, as shown below.

 ##
 x - 1:10
 y - c(1,3,2,4,5,10,13,15,19,22)
 plot(x,y)
 ##


 Two fitted  models, with ranges of [1,5] and [5,10], can be easily fitted 
 separately by lm function as shown below:

 ###
 mod1 - lm(y[1:5] ~ x[1:5])
 mod2 - lm(y[5:10] ~ x[5:10])
 ###

 I'm looking for the interception points between these two straight lines from 
 these fitted models, mod1 and mod2.

 Could someone advice me the way to do it?

 Thank you
 Fir


extract the coefficients, rearrange in a linear system, use solve.


 m1 - as.matrix(rbind(coef(mod1), coef(mod2)))
 a - cbind(c(1,1), -m1[, 2])
 a
 [,1]  [,2]
[1,]1 -0.90
[2,]1 -3.257143
 b - m1[,1]
 solve(a=a,b=b)
[1] 4.396364 4.551515

This seems to visually verify the result:

 plot(1:10,1:10, type=n)
 abline(mod1)
 abline(mod2)

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting/modifying a symbolic formula

2009-10-21 Thread Bryan Hanson
Thanks Gabor, you taught me two useful things: all.vars, and the fact that a
formula object always has length 3.  Problem solved.  Bryan


On 10/21/09 11:57 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 Here is one way:
 
 fo - y ~ f1 * f2
 one.x - lapply(all.vars(fo[[3]]), function(x) { fo[[3]] - as.name(x); fo })
 one.x
 [[1]]
 y ~ f1
 
 [[2]]
 y ~ f2
 
 
 On Wed, Oct 21, 2009 at 11:29 AM, Bryan Hanson han...@depauw.edu wrote:
 Hello All..
 
 Please consider the following:
 
 y - rnorm(20, mean = 10)
 f1 - as.factor(rep(c(A, B, B, A), 5))
 f2 - as.factor(rep(c(C, D), 10))
 testdata - data.frame(y, f1, f2)
 
 testFunc - function(formula, data, ...) {
 #    mf - model.frame(formula, data)
    kw.res - kruskal.test(formula, data)
    }
 
 res - testFunc(y ~ f1 * f2, testdata)
 
 Which works perfectly well.  Now, I would like to do some other tests which
 allow only one factor, for instance wilcox.test.  So, I would like to modify
 my formula so that it reads y ~ f1 or y ~ f2, then conduct more tests.  What
 is the smart way to subset or modify an existing symbolic formula?  I have
 been staring at the attributes of res and mf (commented out currently) but
 these are nested in a way I find difficult to extract.  ?terms,
 ?terms.formula, ?formula etc all discuss these attributes but as I said, it
 seems a bit impenetrable.  RSiteSearch(symbolic formula) returns too many
 answers.  No doubt I am missing the obvious, as this is my first foray into
 using formulas in my own functions.
 
 TIA, Bryan
 *
 Bryan Hanson
 Acting Chair
 Professor of Chemistry  Biochemistry
 DePauw University, Greencastle IN USA
 
 
 sessionInfo()
 R version 2.10.0 RC (2009-10-19 r50172)
 i386-apple-darwin8.11.1
 
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] datasets  tools     grid      utils     stats     graphics  grDevices
 methods
 [9] base
 
 other attached packages:
  [1] ggplot2_0.8.3      reshape_0.8.3      proto_0.3-8        mvbutils_22.0
  [5] ChemoSpec_1.2      lattice_0.17-25    mvoutlier_1.4      plyr_0.1.8
  [9] RColorBrewer_1.0-2 chemometrics_0.4   som_0.3-4
 robustbase_0.4-5
 [13] rpart_3.1-45       pls_2.1-0          pcaPP_1.7          mvtnorm_0.9-7
 [17] nnet_7.2-48        mclust_3.2         MASS_7.2-48        lars_0.9-7
 [21] e1071_1.5-19       class_7.2-48
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] squared euclidean distance

2009-10-21 Thread Sarah Goslee
You can calculate the Euclidean distance with dist() and then square it.

Sarah

On Wed, Oct 21, 2009 at 11:35 AM, Caro B. carolin.w...@googlemail.com wrote:
 Dear R-Help-Team,


 I would like to cluster my data using the ward-method. In several papers I
 read (e.g. Bahrenberg) that it is neccesary to use the squared euclidean
 distance with the ward-method. Unfortunatelly I cannot find this term in r
 as a method for measuring the distance.

 Does anybody have an idea?

 Thanks in advance,
 Carolin



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] formula and model.frame

2009-10-21 Thread Charles C. Berry



Emulate lm.

myFun -
  function(formula, data, na.action, id, ...){
  mf - match.call(expand.dots = FALSE)
  m - match(c(formula, data, id, na.action), names(mf), 0L)
  mf - mf[c(1L, m)]
  mf$drop.unused.levels - TRUE
  mf[[1L]] - as.name(model.frame)
  mf - eval(mf, parent.frame())
  mf
  }

myFun(score ~ grade, qqq, id=idVar)


HTH,

Chuck

On Wed, 21 Oct 2009, Doran, Harold wrote:


Suppose I have the following function

myFun - function(formula, data){
f - formula(formula)
dat - model.frame(f, data)
dat
}

Applying it with this sample data yields a new dataframe:

qqq - data.frame(grade = c(3, NA, 3,4,5,5,4,3), score = rnorm(8), idVar = 
c(1:8))

dat - myFun(score ~ grade, qqq)

However, what I would like is for the resulting dataframe (dat) to include as a 
column idVar. Naively, I could do

dat - myFun(score ~ grade + idVar, qqq)

This gives what I'm after in terms of the resulting data, but doesn't make 
sense within the context of the model I am working on since idVar is not one of 
the conditioning variables used, it has a different purpose altogether. So, I 
was thinking another way is to attach it somehow afterwards. Something like:

myFun - function(formula, data, id, na.action){
f - formula(formula)
idVar - data[, id]
dat - model.frame(f, data, na.action = na.action)
dat[, id] - idVar
dat
}

myFun(score ~ grade, qqq, id = 'idVar', na.action = NULL)

Of course, I intentionally use na.action = NULL here because the following 
occurs, of course


myFun(score ~ grade, qqq, id = 'idVar', na.action = na.omit)

Error in `[-.data.frame`(`*tmp*`, , id, value = 1:8) :
 replacement has 8 rows, data has 7

I see a few workarounds, but I am certain there is a cleaner way to accomplish 
this.

Harold



sessionInfo()

R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] lme4_0.999375-28   Matrix_0.999375-25 lattice_0.17-22xtable_1.5-5   
adapt_1.0-4MiscPsycho_1.4
[7] statmod_1.3.8

loaded via a namespace (and not attached):
[1] grid_2.9.0  tools_2.9.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] squared euclidean distance

2009-10-21 Thread Gavin Simpson
On Wed, 2009-10-21 at 17:35 +0200, Caro B. wrote:
 Dear R-Help-Team,
 
 
 I would like to cluster my data using the ward-method. In several papers I
 read (e.g. Bahrenberg) that it is neccesary to use the squared euclidean
 distance with the ward-method. Unfortunatelly I cannot find this term in r
 as a method for measuring the distance.
 
 Does anybody have an idea?

Compute the Euclidean distance then square the numbers to undo the sqrt
applied within dist.

set.seed(1)
dat - data.frame(X1 = runif(10), X2 = runif(10))
d.euc - dist(dat)
d.sqeuc - d.euc^2
class(d.sqeuc) ## still a dist object

I have no idea what/who Bahrenberg is or whether/why it is advisable to
use this sq Euclidean with Ward's method; in my field I've seen Ward's
method used with all sorts of distance coefficients...

HTH

G

 
 Thanks in advance,
 Carolin
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Temperature Prediction Model

2009-10-21 Thread Steve Lianoglou

Hi,

In order to have more eyes on this, I'm CCing this back to lease  
(please try to keep further correspondence here, since most mail to  
*this* address of mine probably gets lost if it's not coming in from a  
list to begin with) ...


I'm not really sure I have much to say about your problem ... it seems  
you're working in an open research area that I have 0 knowledge about.


I'd recommend you talk with your advisor a bit to see what models they  
have in mind ... I'd imagine questions directed to this list will be  
better responded to when they're more specific, eg. some problem about  
using a particular library/model/package to perform a specific task,  
and not more open ended ones like which equation you can use to fit  
your data (or even if that's a reasonable thing to do, at all).


Anyway, sorry .. don't have much valuable input, but good luck all the  
same.


-steve


On Oct 21, 2009, at 1:07 PM, Aneeta Bhattacharyya wrote:


Hi Steve,

Thanks for your email.
The data that I have has been collected by a sensor network deployed  
by Intel. You may take a look at the network at the following  
website http://db.csail.mit.edu/labdata/labdata.html


The main goal of my project is to simulate a physical layer attack  
on a sensor network and to detect such an attack. In order to detect  
an attack I need to have a model that would define the normal  
behaviour. So the actual variation of temperature throughout the  
year is not very important out here. I have a set of data for a  
period of 7 days which is assumed to be the correct behaviour and I  
need to build a model upon that data. I may refine the model later  
on to take into account temperature variations throughout the year.


Yes I am trying to build a model that will predict the temperature  
just on the given time of the day so that I am able to compare it  
with the observed temperature and determine if there is any  
abnormality. Each node should have its own expectation model (i.e.  
there will be no correlation between the readings of the different  
nodes).


Please let me know if you have any further questions.

Many Thanks,
Aneeta


On Wed, Oct 21, 2009 at 12:46 PM, Steve Lianoglou mailinglist.honey...@gmail.com 
 wrote:

Hi,

On Oct 21, 2009, at 12:31 PM, Aneeta wrote:


Greetings!

As part of my research project I am using R to study temperature data
collected by a network. Each node (observation point) records  
temperature of

its surroundings throughout the day and generates a dataset. Using the
recorded datasets for the past 7 days I need to build a prediction  
model for

each node that would enable it to check the observed data against the
predicted data. How can I derive an equation for temperature using the
datasets?
The following is a subset of one of the datasets:-

Time  Temperature

07:00:17.369668   17.509
07:03:17.465725   17.509
07:04:17.597071   17.509
07:05:17.330544   17.509
07:10:47.838123   17.5482
07:14:16.680696   17.5874
07:16:46.67457 17.5972
07:29:16.887654   17.7442
07:29:46.705759   17.754
07:32:17.131713   17.7932
07:35:47.113953   17.8324
07:36:17.194981   17.8324
07:37:17.227013   17.852
07:38:17.809174   17.8618
07:38:48.00011 17.852
07:39:17.124362   17.8618
07:41:17.130624   17.8912
07:41:46.966421   17.901
07:43:47.524823   17.95
07:44:47.430977   17.95
07:45:16.813396   17.95

I think you/we need much more information.

Are you really trying to build a model that predicts the temperature  
just given the time of day?


Given that you're in NY, I'd say 12pm in August sure feels much  
different than 12pm in February, no?


Or are you trying to predict what one sensor readout would be at a  
particular time given readings from other sensors at the same time?


Or ... ?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 |  Memorial Sloan-Kettering Cancer Center
 |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Steve Lianoglou

Howdy,

On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote:
snip
Yes, exactly that. In your example, though, the variation seems to  
be a lot smaller. I'm guessing that has to with the data.


If I instead output the decision values, the whole procedure is  
fully reproducible, i.e. the exact same values are returned when I  
retrain the model.


By the decision values, you mean the predict labels, right?

I have no idea how the probabilities are calculated, but it seems to  
be in this step that the differences arise. In my case, I feel a bit  
hesitant to use them when they differ that much between runs (15% or  
so)...


I'd find that a bit disconcerting, too. Can you give a sample of your  
data + code your using that can reproduce this example?


Warning: Brainstorming Below

If I were to calculate probabilities for my class labels, I'd make the  
probability some function of the example's distance from the decision  
boundary.


Now, if your decision boundary isn't changing from run to run (and I  
guess it really shouldn't be, since the SVM returns the maximum margin  
classifier (which is, by definition, unique, right?)), it's hard to  
imagine why these probabilities would change, either ...


... unless you're holding out different subsets of your data during  
training, or perhaps have a different value for your penalty (cost)  
parameter when building the model. I believe you said that you're  
actually training the same exact model each time, though, right?


Anyway, I see the help page for ?svm says this, if it helps:

The probability model for classification fits a logistic distribution  
using maximum likelihood to the decision values of all binary  
classifiers, and computes the a-posteriori class probabilities for the  
multi-class problem using quadratic optimization


-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
  |  Memorial Sloan-Kettering Cancer Center
  |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reshaping data

2009-10-21 Thread Lo, Ken
Hi all,

 

I have a matrix of correlation values between all pairwise comparison in
an experiment.  For example, I have 2 time points (1,2) each in
triplicate. Thus, I have the following matrix

 

 

1-1

1-2

1-3

2-1

2-2

2-3

1-1

NA

...

...

...

...

...

1-2

...

NA

...

...

...

...

1-3

...

...

NA

...

...

...

2-1

...

...

...

NA

...

...

2-2

...

...

...

...

NA

...

2-3

...

...

...

...

...

NA

 

What I'd like to do is to reshape the data so that it would be easy for
me to summarize all the correlation values across the triplicates (i.e.
mean) for time point 1 and 2.  It sounds as though the reshape package
should be able to do this, but the solution is not obvious to me.  Can
anybody help?

 

Best,

 

Ken

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:

2009-10-21 Thread Charles Annis, P.E.
David:

Do you mean inappropriate or embarrassing?  

How would we R-ians know what has happened at REVolution were it not for
Ajay's note?  Were you planning a press release?  Something like, 47% of
Revolution summarily fired.  Nobody left with more than a year of
experience...?



Charles Annis, P.E.

charles.an...@statisticalengineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of David M Smith
Sent: Wednesday, October 21, 2009 2:20 PM
To: Ajay ohri
Cc: r-help@r-project.org
Subject: Re: [R] News on R s largest corporate partner REVolution
Computingand SPSS CEO:

It is clearly inappropriate for me or anyone else from REvolution to
comment on matters pertaining to any employee or ex-employee. I would
further add that this is a highly inappropriate use of this list.
Apologies to all concerned.

# David Smith

On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote:
 Start the REvolution without me...

-- 
David M Smith da...@revolution-computing.com
VP of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Palo Alto, CA, USA)

Download REvolution R free:
http://revolution-computing.com/downloads/revolution-r.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:

2009-10-21 Thread Rolf Turner


On 22/10/2009, at 7:20 AM, David M Smith wrote:


It is clearly inappropriate for me or anyone else from REvolution to
comment on matters pertaining to any employee or ex-employee. I would
further add that this is a highly inappropriate use of this list.


Oh, I dunno.  I have the impression that the REvolution
initiative has, or could have, a substantial impact upon R,
and thus what goes on at REvolution is of concern to the
R community.


Apologies to all concerned.


For what?  For the fact that a particular cat has been let
out of a particular bag?

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing data and LME models and diagnostic plots

2009-10-21 Thread Mark Difford

Peter Flom wrote:

 I am puzzled by the performance of LME in situations where there are
 missing data.  As I 
 understand it, one of the strengths of this sort of model is how well it
 deals with missing 
 data, yet lme requires nonmissing data. 

You are confusing missing data with an unbalanced design. A strengths of LME
is that it copes with the latter, which aov() does not.

Regards, Mark.


Peter Flom-2 wrote:
 
 Hello
 
 Running R2.9.2 on Windows XP
 
 I am puzzled by the performance of LME in situations where there are
 missing data.  As I understand it, one of the strengths of this sort of
 model is how well it deals with missing data, yet lme requires nonmissing
 data.  
 
 Thus, 
 
 m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
 data = long,
 random = ~I(year-2007.5)|schoolnum)
 
 causes an error in na.fail.default, but adding na.action = na.omit makes
 a model with no errors.  However, if I create that model, i.e.
 
 m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
data = long, 
random = ~I(year-2007.5)|schoolnum,
na.action = na.omit)
 
 then the diagnostic plots suggested in Pinheiro  Bates produce errors;
 e.g.
 
 plot(m1.mod1, schoolnum~resid(.), abline = 0)
 
 gives an error could not find function NaAct.
 
 
 Searching the archives showed a similar question from 2007, but did not
 show any responses.
 
 Thanks for any help
 
 Peter
 
 )
 
 Peter L. Flom, PhD
 Statistical Consultant
 Website: www DOT peterflomconsulting DOT com
 Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
 Twitter:   @peterflom
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Missing-data-and-LME-models-and-diagnostic-plots-tp25996584p25998546.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] random numbers between 0 and 1

2009-10-21 Thread carol white
Hi,
To generate random numbers between 0 and 1, do you use rnorm followed by dnrom? 
for ex, for 10 variables

a = rnorm(10)
 a
 [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393  0.80010865
 [7] -0.01026882 -0.23887516  2.29912600 -1.38352143
 dnorm(a)
 [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637
 [7] 0.39892125 0.38772103 0.02838403 0.15320103


Regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:

2009-10-21 Thread Duncan Murdoch

On 10/21/2009 3:09 PM, Charles Annis, P.E. wrote:

David:

Do you mean inappropriate or embarrassing?  


How would we R-ians know what has happened at REVolution were it not for
Ajay's note?  Were you planning a press release?  


I think he already did post a note on his blog mentioning that Danese 
was no longer employed there.  There was also a press release, I don't 
know if it mentioned her.  The blog entry also said that David continues 
his employment, which contradicts the second part of the claim below.


So I suspect we're all still pretty much ignorant of what happened, and 
I wouldn't mind keeping it that way, but if I wanted more details, I'd 
look on the REvolution site for their side of the story, and on Danese 
Cooper's site for hers.


Not on R-Help.

Duncan Murdoch

Something like, 47% of

Revolution summarily fired.  Nobody left with more than a year of
experience...?



Charles Annis, P.E.

charles.an...@statisticalengineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 
-Original Message-

From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of David M Smith
Sent: Wednesday, October 21, 2009 2:20 PM
To: Ajay ohri
Cc: r-help@r-project.org
Subject: Re: [R] News on R s largest corporate partner REVolution
Computingand SPSS CEO:

It is clearly inappropriate for me or anyone else from REvolution to
comment on matters pertaining to any employee or ex-employee. I would
further add that this is a highly inappropriate use of this list.
Apologies to all concerned.

# David Smith

On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote:

Start the REvolution without me...




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:

2009-10-21 Thread kulwinder banipal

Well I do not think that this is highly inappropriate use of this list. I am 
sure you new designation of VP makes you feel all more powerful but your 
authority does not extend to this list. It should continue to rest with Core R 
team. It seems had this list been under your control (i.e revolution computing) 
you would have turned it off-typically Corporate Strategy. Just an example 
how money and designation influences those who claim otherwise.  

 Date: Wed, 21 Oct 2009 14:20:08 -0400
 From: da...@revolution-computing.com
 To: ohri2...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] News on R s largest corporate partner REVolution Computing  
 and SPSS CEO:
 
 It is clearly inappropriate for me or anyone else from REvolution to
 comment on matters pertaining to any employee or ex-employee. I would
 further add that this is a highly inappropriate use of this list.
 Apologies to all concerned.
 
 # David Smith
 
 On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote:
  Start the REvolution without me...
 
 -- 
 David M Smith da...@revolution-computing.com
 VP of Community, REvolution Computing www.revolution-computing.com
 Tel: +1 (206) 577-4778 x3203 (Palo Alto, CA, USA)
 
 Download REvolution R free:
 http://revolution-computing.com/downloads/revolution-r.php
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Trusted email with powerful SPAM protection.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random numbers between 0 and 1

2009-10-21 Thread Erik Iverson
Uniformly distributed random numbers between 0 and 1?  Try ?runif 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of carol white
 Sent: Wednesday, October 21, 2009 2:26 PM
 To: r-h...@stat.math.ethz.ch
 Subject: [R] random numbers between 0 and 1
 
 Hi,
 To generate random numbers between 0 and 1, do you use rnorm followed by
 dnrom? for ex, for 10 variables
 
 a = rnorm(10)
  a
  [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393
 0.80010865
  [7] -0.01026882 -0.23887516  2.29912600 -1.38352143
  dnorm(a)
  [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637
  [7] 0.39892125 0.38772103 0.02838403 0.15320103
 
 
 Regards,
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:

2009-10-21 Thread Rolf Turner


On 22/10/2009, at 8:27 AM, kulwinder banipal wrote:



Well I do not think that this is highly inappropriate use of this  
list. I am sure you new designation of VP makes you feel all more  
powerful but your authority does not extend to this list. It should  
continue to rest with Core R team. It seems had this list been  
under your control (i.e revolution computing) you would have turned  
it off-typically Corporate Strategy. Just an example how money  
and designation influences those who claim otherwise.


Right on, Red Freak!!!

cheers,

Rolf Turner



Date: Wed, 21 Oct 2009 14:20:08 -0400
From: da...@revolution-computing.com
To: ohri2...@gmail.com
CC: r-help@r-project.org
Subject: Re: [R] News on R s largest corporate partner REVolution  
Computing	and SPSS CEO:


It is clearly inappropriate for me or anyone else from REvolution to
comment on matters pertaining to any employee or ex-employee. I would
further add that this is a highly inappropriate use of this list.
Apologies to all concerned.

# David Smith

On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com  
wrote:

Start the REvolution without me...


--
David M Smith da...@revolution-computing.com
VP of Community, REvolution Computing www.revolution-computing.com
Tel: +1 (206) 577-4778 x3203 (Palo Alto, CA, USA)

Download REvolution R free:
http://revolution-computing.com/downloads/revolution-r.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.


_
Hotmail: Trusted email with powerful SPAM protection.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting- 
guide.html

and provide commented, minimal, self-contained, reproducible code.



##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random numbers between 0 and 1

2009-10-21 Thread Gábor Csárdi
I would suggest to use the generator at
http://submoon.freeshell.org/pix/valium/dilbert_rng.jpg
and subtract 8.5.

Best,
Gabor


On Wed, Oct 21, 2009 at 9:25 PM, carol white wht_...@yahoo.com wrote:
 Hi,
 To generate random numbers between 0 and 1, do you use rnorm followed by 
 dnrom? for ex, for 10 variables

 a = rnorm(10)
 a
  [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393  0.80010865
  [7] -0.01026882 -0.23887516  2.29912600 -1.38352143
 dnorm(a)
  [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637
  [7] 0.39892125 0.38772103 0.02838403 0.15320103


 Regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gabor Csardi gabor.csa...@unil.ch UNIL DGM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:

2009-10-21 Thread Barry Rowlingson
On Wed, Oct 21, 2009 at 8:09 PM, Charles Annis, P.E.
charles.an...@statisticalengineering.com wrote:
 David:

 Do you mean inappropriate or embarrassing?

 How would we R-ians know what has happened at REVolution were it not for
 Ajay's note?  Were you planning a press release?  Something like, 47% of
 Revolution summarily fired.  Nobody left with more than a year of
 experience...?

 Ajay's note was a verbatim cut n paste from Danese Cooper's blog,
with no comment, editorial, opinion, or anything. Although Danese may
well be an open-source evangelist, that doesn't mean her blog is
public domain. This was essentially MLP[1] which I don't think we want
on the R-help list. Anyone who cares about REvolution will already be
following the relevant blogs, reading the New York Times, and the
words of the prophet that were written on the subway walls. Oops I
went all Simon and Garfunkel there.

 If people want to add value to relevant blog posts onto R-news by
commenting and inviting discussion then that's fine by me. But no more
cut n paste jobs.

Barry

[1] Mindless Link Propogation - when someone just emails Hey guyz
look at this : http://example.com/lulz;

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: reshaping data

2009-10-21 Thread Lo, Ken
I apologize for the previous post using HTML.  Haven't posted for a while and 
e-mail client default.

Best,

Ken


Hi all,

I have a matrix of correlation values between all pairwise comparison in an 
experiment.  For example, I have 2 time points (1,2) each in triplicate. Thus, 
I have the following matrix

 1-1 1-2 1-3 2-1 2-2 2-3
 1-1NA  ... ... ... ... ...
 1-2... NA  ... ... ... ...
 1-3... ... NA  ... ... ...
 2-1... ... ... NA  ... ...
 2-2... ... ... ... NA  ...
 2-3... ... ... ... ... NA

What I'd like to do is to reshape the data so that it would be easy for me to 
summarize all the correlation values across the triplicates (i.e. mean) for 
time point 1 and 2.  It sounds as though the reshape package should be able to 
do this, but the solution is not obvious to me.  Can anybody help?

Best,

Ken
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random numbers between 0 and 1

2009-10-21 Thread Barry Rowlingson
On Wed, Oct 21, 2009 at 8:37 PM, Gábor Csárdi csa...@rmki.kfki.hu wrote:
 I would suggest to use the generator at
 http://submoon.freeshell.org/pix/valium/dilbert_rng.jpg
 and subtract 8.5.

 You may laugh (indeed I did) but some medical trials have used (and
poss still do) telephone-a-human random numbers. When deciding to give
medicine or placebo the doctor calls the phone number and asks for a
random number, which is read out, and this decides what the patient
gets.

 I suppose this could be implemented in R with an interface to a
speech recognition engine and a telephone... but runif(100) is easier.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random numbers between 0 and 1

2009-10-21 Thread Rolf Turner


On 22/10/2009, at 8:25 AM, carol white wrote:


Hi,
To generate random numbers between 0 and 1, do you use rnorm  
followed by dnrom? for ex, for 10 variables


a = rnorm(10)

a
 [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393   
0.80010865

 [7] -0.01026882 -0.23887516  2.29912600 -1.38352143

dnorm(a)

 [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637
 [7] 0.39892125 0.38772103 0.02838403 0.15320103


Well, this will give you random (in some sense) numbers between 0 and 1.
(Actually they will be between 0 and dnorm(0) = approx. 0.3989.)

Just what the *distribution* of these numbers would be is obscure to me.

You should also be aware that the values of dnorm() could be *larger*
than 1, if the standard deviation were specified to be something smaller
than the default value of 1.  (Note that dnorm() is a ***density***,
not a probability function.)

Anyway this is a very convoluted way of going at the problem.  Why
not just generate random numbers between 0 an 1 ``directly'', using
runif()?

cheers,

Rolf Turner



##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:

2009-10-21 Thread Rolf Turner


On 22/10/2009, at 8:37 AM, Barry Rowlingson wrote:


On Wed, Oct 21, 2009 at 8:09 PM, Charles Annis, P.E.
charles.an...@statisticalengineering.com wrote:

David:

Do you mean inappropriate or embarrassing?

How would we R-ians know what has happened at REVolution were it  
not for
Ajay's note?  Were you planning a press release?  Something like,  
47% of

Revolution summarily fired.  Nobody left with more than a year of
experience...?


 Ajay's note was a verbatim cut n paste from Danese Cooper's blog,
with no comment, editorial, opinion, or anything. Although Danese may
well be an open-source evangelist, that doesn't mean her blog is
public domain. This was essentially MLP[1] which I don't think we want
on the R-help list. Anyone who cares about REvolution will already be
following the relevant blogs, reading the New York Times, and the
words of the prophet that were written on the subway walls.


Well I don't know how much I ***care*** about REvolution,
but I'm at least vaguely interested.  And I was toadally
unaware of what was going on until I read Ajay's post.
(I don't read the New York Times, or blogs, or subway walls.)
So from my pov Ajay did the community a service.


Oops I went all Simon and Garfunkel there.

 If people want to add value to relevant blog posts onto R-news by
commenting and inviting discussion then that's fine by me. But no more
cut n paste jobs.

Barry

[1] Mindless Link Propogation - when someone just emails Hey guyz
look at this : http://example.com/lulz;


Hey Baz; sometimes things just speak for themselves.  No
commentary is necessary.

cheers,

Rolf

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SVM probability output variation

2009-10-21 Thread Anders Carlsson
Hi,

 snip
  If I instead output the decision values, the whole procedure is
  fully reproducible, i.e. the exact same values are returned when I
  retrain the model.
 
 By the decision values, you mean the predict labels, right?

The output of decision values can be turned on in the predict.svm, and is, as I 
have understood, the distance from the data point to the hyperplane. (I should 
say that my knowledge here is limited to concepts, I know nothing about the 
details in which this works...). I use these to create ROC curves etc.
 
  I have no idea how the probabilities are calculated, but it seems to
  be in this step that the differences arise. In my case, I feel a bit
  hesitant to use them when they differ that much between runs (15% or
  so)...
 
 I'd find that a bit disconcerting, too. Can you give a sample of your
 data + code your using that can reproduce this example?
 

I have the data at the office, so I can't do that now (at home).

 Warning: Brainstorming Below
 
 If I were to calculate probabilities for my class labels, I'd make the
 probability some function of the example's distance from the decision
 boundary.
 
 Now, if your decision boundary isn't changing from run to run (and I
 guess it really shouldn't be, since the SVM returns the maximum margin
 classifier (which is, by definition, unique, right?)), it's hard to
 imagine why these probabilities would change, either ...
 
 ... unless you're holding out different subsets of your data during
 training, or perhaps have a different value for your penalty (cost)
 parameter when building the model. I believe you said that you're
 actually training the same exact model each time, though, right?

Yes, I'm using the exact same data to train each time. I thought this would 
generate identical models, but that doesn't appear to be the case.

 
 Anyway, I see the help page for ?svm says this, if it helps:
 
 The probability model for classification fits a logistic distribution
 using maximum likelihood to the decision values of all binary
 classifiers, and computes the a-posteriori class probabilities for the
 multi-class problem using quadratic optimization

This is where I realise I'm in a bit over my head on the theroy side - this 
means nothing to me...
 
 -steve

Thanks again,
Anders

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Temperature Prediction Model

2009-10-21 Thread Aneeta

The data that I use has been collected by a sensor network deployed by Intel.
You may take a look at the network at the following website
http://db.csail.mit.edu/labdata/labdata.html

The main goal of my project is to simulate a physical layer attack on a
sensor network and to detect such an attack. In order to detect an attack I
need to have a model that would define the normal behaviour. So the actual
variation of temperature throughout the year is not very important out here.
I have a set of data for a period of 7 days which is assumed to be the
correct behaviour and I need to build a model upon that data. I may refine
the model later on to take into account temperature variations throughout
the year.

Yes I am trying to build a model that will predict the temperature just on
the given time of the day so that I am able to compare it with the observed
temperature and determine if there is any abnormality. Each node should have
its own expectation model (i.e. there will be no correlation between the
readings of the different nodes).


Steve Lianoglou-6 wrote:
 
 Hi,
 
 On Oct 21, 2009, at 12:31 PM, Aneeta wrote:
 

 Greetings!

 As part of my research project I am using R to study temperature data
 collected by a network. Each node (observation point) records  
 temperature of
 its surroundings throughout the day and generates a dataset. Using the
 recorded datasets for the past 7 days I need to build a prediction  
 model for
 each node that would enable it to check the observed data against the
 predicted data. How can I derive an equation for temperature using the
 datasets?
 The following is a subset of one of the datasets:-

  Time  Temperature

 07:00:17.369668   17.509
 07:03:17.465725   17.509
 07:04:17.597071   17.509
 07:05:17.330544   17.509
 07:10:47.838123   17.5482
 07:14:16.680696   17.5874
 07:16:46.67457 17.5972
 07:29:16.887654   17.7442
 07:29:46.705759   17.754
 07:32:17.131713   17.7932
 07:35:47.113953   17.8324
 07:36:17.194981   17.8324
 07:37:17.227013   17.852
 07:38:17.809174   17.8618
 07:38:48.00011 17.852
 07:39:17.124362   17.8618
 07:41:17.130624   17.8912
 07:41:46.966421   17.901
 07:43:47.524823   17.95
 07:44:47.430977   17.95
 07:45:16.813396   17.95
 
 I think you/we need much more information.
 
 Are you really trying to build a model that predicts the temperature  
 just given the time of day?
 
 Given that you're in NY, I'd say 12pm in August sure feels much  
 different than 12pm in February, no?
 
 Or are you trying to predict what one sensor readout would be at a  
 particular time given readings from other sensors at the same time?
 
 Or ... ?
 
 -steve
 
 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
|  Memorial Sloan-Kettering Cancer Center
|  Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Temperature-Prediction-Model-tp25995874p25997650.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing data and LME models and diagnostic plots

2009-10-21 Thread Peter Flom
I wrote

 I am puzzled by the performance of LME in situations where there are
 missing data.  As I 
 understand it, one of the strengths of this sort of model is how well it
 deals with missing 
 data, yet lme requires nonmissing data. 


Mark Difford replied

You are confusing missing data with an unbalanced design. A strengths of LME
is that it copes with the latter, which aov() does not.



Thanks for your reply, but I don't believe I am confused with respect to 
missing data in mixed models.  See e.g. Hedeker and Gibbons, Longitudinal Data 
Analysis, which repeatedly stresses that mixed models provide good estimates if 
the data are missing at random.

Regards

Peter



Peter L. Flom, PhD
Statistical Consultant
Website: www DOT peterflomconsulting DOT com
Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
Twitter:   @peterflom

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:

2009-10-21 Thread Ajay ohri
I am sorry.

I will refrain from this in the future (using the list which is a technical
resource and not a forum inappropriately).

My blog has my views on it -http://decisionstats.com so I wont cut and paste
on that.

I would like to applaud David's team at REvolution for finally releasing an
Ubuntu version of REvolution R, though the Windows version 64 bit was
developed much earlier. I would like to say that Dave's credentials in open
source or his personal technical authority has not been questioned by me (
and I don't think by anyone on this list).

The new management is led by an ex Founder of SPSS, so I guess that things
are looking up already.

A new CEO can only mean more hunger to get R up and running in corporate
circles so that smart people who write excellent code start making more
money than ...?

. Commerce demands that REvolution gets a bigger share than it has been
getting and thats what most of the people on this list want ( except there
is no forums section for discussion for R- help while there are forums
section.

Ajay
Knoxville

Student

Go Vols!


On Wed, Oct 21, 2009 at 2:24 PM, Erik Iverson eiver...@nmdp.org wrote:

 Nothing to do with what R-help is about.  Please refrain from posting
 things like this in the future.

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
  On Behalf Of Ajay ohri
  Sent: Wednesday, October 21, 2009 12:52 PM
  To: r-help@r-project.org; sas-l; spssx-l
  Subject: [R] News on R s largest corporate partner REVolution Computing
  and SPSS CEO:
 
  Start the REvolution without me...
  http://danesecooper.blogs.com/divablog/2009/10/start-the-revolution-
  without-me.html
 
  *From Danese Cooper's Blog*
 
 
  Some of you may have become aware of REvolution
  Computinghttp://revolution-computing.com/,
  a commercial open source company organized around the R
  Languagehttp://www.r-project.org/,
  when I joined in March
  http://news.cnet.com/8301-1001_3-10202355-92.htmlof this year.  For
  the past few months we have been working on a B-Round of
  funding.  It was an interesting process and I was happy to be working in
  my
  first startup company after so many years in very large corporations.
 
  We built a small team to work on Community Engineering, by which we
  meant
  developing assets both to benefit the R Language community as well as to
  entice and inform the Alpha-Geek community to learn and use R.  We set
  up
  an Advisory Board designed to advise REvolution management about
 decisions
  relating to REvo and Open Source, and we helped put REvolution R into the
  Karmic Koala release of Ubuntu.  It was really fun to work in a small,
  agile
  team and I felt like I was getting a great education in startups and we
  were
  rapidly moving the company forward...Why didn't I join a startup years
  ago?
 
  The funding deal closed on Wednesday last week...
 
  Late the next afternoon I received a call from the new COO notifying me
  that
  my services would no longer be required at REvolution., effective
  immediately and with no severance.  Apparently, the company is moving in
 a
  different direction.
  http://blog.revolution-computing.com/2009/10/revolution-computing-gets-
  major-funding-new-ceo.html
 
  I was surprised that the new CEO,  wasn't personally handling this
  unpleasant task...but I guess that might have been distasteful after the
  many assurances he gave me and my team last July at OSCON that we were
  absolutely critical to the company's success and that he would be
  making
  no changes for at least three months after he assumed control.  Personal
  courage in difficult situations is rare.
 
  What I find most interesting about today's REvolution announcements is
 the
  space they spent thanking the previous management team, given nearly all
  of
  us, including the Founders and the Board, were just fired.  47% of the
  company wiped out and nobody left with more than a year of
  experience...Shit
  happens...
 
  And so we begin to pick up the pieces and move on.  I've spent much of
 the
  past few days consoling coworkers, personally breaking the news to the
  many
  kind friends who had agreed to help us increase interest in R and
  Revolution, and working out what I might be doing next.  I have some
  interesting possibilities already, although I'm still open to
  suggestions...so stay tuned.
 
  Meanwhile I can honestly say that the new REvolution Computing will
 little
  resemble the company I was proud to join and represent.  I still think
 the
  R
  Language is really interesting, but I'm no longer sure REvo is the one to
  watch in this space anymore...For the sake of my friends among the
  remaining
  employees and shareholders I hope I'm wrong.
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 

Re: [R] date conversion not as i would have expected

2009-10-21 Thread Duncan Mackay

Hi

This is on WinXP with regional settings as EST (we are now on DST but I run 
EST) R2.9.2


x - structure(1254351600, class = c(POSIXt, POSIXct), tzone = )
 x
[1] 2009-10-01 09:00:00 EST
 as.POSIXlt(x)
[1] 2009-10-01 09:00:00 EST
 as.Date(x, formate=%Y-%m-%d )
[1] 2009-09-30

I had a similar problem last week. I am not sure how Bill Gates does his 
times but for R see Rnews 4(1).

There are slight differences between POSIX and Date classes

NB If you run EST in DST periods Bill Gates still gives the file stamp as 
DST in Vista and XP.
I would presume that it is the same for other zones where there is summer 
time zones


Regards

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email: home: ma...@northnet.com.au




At 00:51 22/10/2009, you wrote:

Good day, i imported some data into R from Excel.  By using the edit()
function, this is what one of the dates looks like in R:

 x - structure(1254351600, class = c(POSIXt, POSIXct), tzone = )
[1] 2009-10-01 BST

However, when i do the following, the date changes:

 as.Date(x, formate=%Y-%m-%d )
[1] 2009-09-30

I don't understand why this is happening. I realise that i can get
around this by doing as.Date(as.character(x)), but would be nice to
understand why it doesn't work directly.

C xx

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I vectorize this loop....

2009-10-21 Thread chipmaney

Basically I need to use the following data to calculate a squared error for
each Sample based on the expected Survival for the zone.

Basically, this code has Live/Dead for each sample, and I need to calculate
the square error based on the Expected Mean (ie, Survival).  The code looks
up the expectation for each zone and applies for each sample in the zone
using a loop:

Data1 - data.frame(Sample=1:6, Live =c(24,25,30,31,22,23), Baseline =
c(35,35,35,32,34,33),Zone = c(rep(Cottonwood,3),rep(OregonAsh,3)))

Data2 - data.frame(Zone = c(Cottonwood,OregonAsh), Survival =
c(0.83,0.76))

for (i in 1:nrow(Data1)) #(Yi -Ybar*Yo)^2
Data1$SquaredError[i] - (Data1$Live[i] -
Data2$Survival[which(Data1$Zone[i]==Data2$Zone)]*Data1$Baseline[i])^2


My question is, can I vectorize this code to avoid the loop?  Obviously, I
could merge the 2 datasets first, but that would still require 2 steps and
Data1 would have a bunch of redundant data.  So, is there a better
alternative?  Is there some way I improve indexing syntax efficiency by
using rownames instead of a column vector?



-- 
View this message in context: 
http://www.nabble.com/How-do-I-vectorize-this-loop-tp26000933p26000933.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] increase size of filled box in a legend

2009-10-21 Thread Janet Choate
Hello R user community,
can anyone tell me how to increase the size of a filled box in a legend?
thanx,
Janet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] drawing a line indicating extent of each factored data series in multipanel lattice xyplot

2009-10-21 Thread Dan Kortschak
Hi,

Am am plotting aggregated frequency data (extracted from an RDBMS)
relating to DNA sequence features for each of the human chromosomes as
described in the table chromosomes below (the frequency data is in a
table 'hits' that has a value (or not) for each of a set of bins across
each chromosome).

I would like to mark the extent of the chromosome (according to the
length value in chromosome) with a line under the xyplot for each panel
(at the moment - not included in the xyplot statement below - I have
just added the length in bins to the panel strip, but this is not the
best solution for representing the data).

I have tried a couple of approaches to this (example below), but have
not got anything to work (the 10 value refers to bin size).

Can anyone suggest an approach to accomplish this?

An additional though less significant question is how I can get the
panels to be in the order of the chromosomes as listed in the
chromosomes table rather than alphabetical order of their names.

thanks
Dan

 chromosomes
   chromosome   refSeq   namelength
1   0  NC_01.9  Homo sapiens chromosome 1 247249719
2   1 NC_02.10  Homo sapiens chromosome 2 242951149
3   2 NC_03.10  Homo sapiens chromosome 3 199501827
4   3 NC_04.10  Homo sapiens chromosome 4 191273063
5   4  NC_05.8  Homo sapiens chromosome 5 180857866
6   5 NC_06.10  Homo sapiens chromosome 6 17082
7   6 NC_07.12  Homo sapiens chromosome 7 158821424
8   7  NC_08.9  Homo sapiens chromosome 8 146274826
9   8 NC_09.10  Homo sapiens chromosome 9 140273252
10  9  NC_10.9 Homo sapiens chromosome 10 135374737
11 10  NC_11.8 Homo sapiens chromosome 11 134452384
12 11 NC_12.10 Homo sapiens chromosome 12 132349534
13 12  NC_13.9 Homo sapiens chromosome 13 114142980
14 13  NC_14.7 Homo sapiens chromosome 14 106368585
15 14  NC_15.8 Homo sapiens chromosome 15 100338915
16 15  NC_16.8 Homo sapiens chromosome 16  88827254
17 16  NC_17.9 Homo sapiens chromosome 17  78774742
18 17  NC_18.8 Homo sapiens chromosome 18  76117153
19 18  NC_19.8 Homo sapiens chromosome 19  63811651
20 19  NC_20.9 Homo sapiens chromosome 20  62435964
21 20  NC_21.7 Homo sapiens chromosome 21  46944323
22 21  NC_22.9 Homo sapiens chromosome 22  49691432
23 22  NC_001807.4 Homo sapiens mitochondrion 16571
24 23  NC_23.9  Homo sapiens chromosome X 154913754
25 24  NC_24.8  Homo sapiens chromosome Y  57772954


xyplot(hits~bin|chromosomes$name[chromosome+1],
data=hits,
horizontal=FALSE,
origin=0,
lab=c(3,10),pch=20,cex=0.01,
layout=c(1,5,5),
strip=strip.custom(style=3,
bg=grey90,par.strip.text=list(cex=0.5)),
as.table=TRUE,
panel=function(x,y,...,length=chromosomes$length[chromosome+1])
{panel.xyplot(x,y,...); panel.lines(0,-100,length/10,-100)}
)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sorting table output

2009-10-21 Thread Erin Hodgess
Dear R People:

Suppose I have the following output:

 table(xx)
xx
 A  C  G  T
13 12 10 15


I would like to have the output sorted in descending order by height
or frequency.

But when I do the following:

 rev(table(xx))
xx
 T  G  C  A
15 10 12 13


the output is sorted by the names rather than the frequency.

Any suggestions, please?

Thanks in advance,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sorting table output

2009-10-21 Thread Stefan Evert

I would like to have the output sorted in descending order by height
or frequency.

But when I do the following:


rev(table(xx))

xx
T  G  C  A
15 10 12 13





Err, I guess you meant to write

sort(table(xx))

here?

Cheers,
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sorting table output

2009-10-21 Thread Rolf Turner


On 22/10/2009, at 11:16 AM, Erin Hodgess wrote:


Dear R People:

Suppose I have the following output:


table(xx)

xx
 A  C  G  T
13 12 10 15


I would like to have the output sorted in descending order by height
or frequency.

But when I do the following:


rev(table(xx))

xx
 T  G  C  A
15 10 12 13




the output is sorted by the names rather than the frequency.

Any suggestions, please?


(a) Why on earth are you using ***rev()***???

(b) Does this do what you want?

 tx - table(xx)
 tx[order(tx)]
xx
G  C  A  T
10 12 13 15

cheers,

Rolf


##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I vectorize this loop....

2009-10-21 Thread William Dunlap


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of chipmaney
 Sent: Wednesday, October 21, 2009 2:58 PM
 To: r-help@r-project.org
 Subject: [R] How do I vectorize this loop
 
 
 Basically I need to use the following data to calculate a 
 squared error for
 each Sample based on the expected Survival for the zone.
 
 Basically, this code has Live/Dead for each sample, and I 
 need to calculate
 the square error based on the Expected Mean (ie, Survival).  
 The code looks
 up the expectation for each zone and applies for each sample 
 in the zone
 using a loop:
 
 Data1 - data.frame(Sample=1:6, Live =c(24,25,30,31,22,23), Baseline =
 c(35,35,35,32,34,33),Zone = c(rep(Cottonwood,3),rep(OregonAsh,3)))
 
 Data2 - data.frame(Zone = c(Cottonwood,OregonAsh), Survival =
 c(0.83,0.76))
 
 for (i in 1:nrow(Data1)) #(Yi -Ybar*Yo)^2
   Data1$SquaredError[i] - (Data1$Live[i] -
 Data2$Survival[which(Data1$Zone[i]==Data2$Zone)]*Data1$Baseline[i])^2
 
 
 My question is, can I vectorize this code to avoid the loop?

Does the following do what you want?
   (Data1$Live -
Data2$Survival[match(Data1$Zone,Data2$Zone)]*Data1$Baseline)^2
  [1] 25.5025 16.4025  0.9025 44.6224 14.7456  4.3264
  
 Obviously, I
 could merge the 2 datasets first, but that would still 
 require 2 steps and
 Data1 would have a bunch of redundant data.

Why worry about a bunch of 'redundant' data?  Its space is
freed after it is no longer referenced, so if you do the
processing in a function and don't put all the temporarily
needed data in a permanent dataset it won't take up space
for long.  If you are running into the memory limits of your
hardware then you have to work harder to avoid using more
memory that you need.  (Note that vec[which(logical)] generally
gives the same result as vec[logical] but uses more space doing
so.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
 
 So, is there a better
 alternative?  Is there some way I improve indexing syntax 
 efficiency by
 using rownames instead of a column vector?
 
 
 
 -- 
 View this message in context: 
 http://www.nabble.com/How-do-I-vectorize-this-loop-tp26000
933p26000933.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Use of the command 'replicate'

2009-10-21 Thread Manny Gomez

Dear R users,

I'd like to ask u whether you know how to sort out the following:

I'm trying to reproduce a dataset of clusters, and for that I need to build
up a cluster index (inside a function) using the command replicate as
follows:


dataset- function(
clusters=100,
cluster.size=50,
outcome.mean=list(mean=3, sd=1),
outcome.var=list(mean=0, sd=1),
treat=1,
unbal = FALSE){

if(clusters%%2!=0) stop(number of clusters, clusters, must be an even
number \n)

clust.size-cluster.size
clusters-clusters

clust.index - rep(1:clusters, times=clust.size)   # cluster
index
treat.ind - c(
# treatment index indicator
rep(0, sum(clust.size[1:(clusters/2)])),
rep(1, sum(clust.size[(clusters/2+1):clusters]))
)

(...)

}

Doing this it gives me an error saying that the 'times' argument is invalid
and that the cluster and treatment indicator indices are both invalid, and
therefore, I'm probably defining clust.size wrongly?!
Could you elucidate me about this??

Thanks a lot.

Manny Gomez
-- 
View this message in context: 
http://www.nabble.com/Use-of-the-command-%27replicate%27-tp26001367p26001367.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple imputation with mix package

2009-10-21 Thread Kurt Smith
I am running into a problem using 'mix' for multiple imputation (over
continuous and categorical variables).

For the way I will be using this I would like to create an imputation
model on some training data set and then use this model to impute
missing values for a different set of individuals (i.e. I need to have
a model in place before I receive their information).

I expected that all that model information would be stored in a single
object. In other words, when I run:
imp.mix(s, theta, x)
I expected that theta completely specifies the model and s (created by
prelim.mix) and x only contain info on incomplete the data set I want
to impute.
As best as I can tell this is not the case. For instance, I create a
'model object' (a general location parameter list)  from the data set
trainSet using em.mix as follows:

sTrain - prelim.mix(trainSet,nCategorical)
thetaTrain - em.mix(sTrain,maxits=1000,showits=TRUE,eps=0.0001)

I then attempt to use this model to impute a missing field (TC) in the
data set workSet as follows:

workSet$TC - NA
sWork - prelim.mix(workSet,nCategorical)
imputedWork - imp.mix(sWork,thetaTrain,workSet)

This does not give realistic values for TC (they are around 0). My
guess is that the part of the imputation model information is stored
in sWork
If I do this it looks like it works better (values of TC in the correct range):
sWork$xbar = sTrain$xbar
sWork$sdv = sTrain$sdv
imputedWork - imp.mix(sWork,thetaTrain,workSet)

Can someone say whether what I am doing is correct? Is changing xbar
and sdv by hand a proper solution? I'm not sure whether that could
mess other things up.

Thanks,
Kurt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] melting columns

2009-10-21 Thread Thibault Grava
Hello,

I'm using  R to run a acoustic analysis software called Seewave. I ask the code 
to extract a list of variables from my recording, and the program give ONE 
table for each of these. The tables consist of a two column data.frame with the 
time in column 1 and the frequency in column 2. However, for my purpose I need 
only one column with the time first and the frequency second. I tried different 
things but my knowledge in R is definitely too low to answer that. Thanks for 
your help!

Thibault Grava

PhD candidate
Natural Resources and Environmental Studies
University of Northern British Columbia
Prince George, BC   V2N 4Z9

Tel: (250) 960-6050
email:gr...@unbc.ca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loop vs. apply(): strange behavior with data frame?

2009-10-21 Thread Roberto Perdisci
Hi everybody,
  I noticed a strange behavior when using loops versus apply() on a data frame.
The example below explicitly computes a distance matrix given a
dataset. When the dataset is a matrix, everything works fine. But when
the dataset is a data.frame, the dist.for function written using
nested loops will take a lot longer than the dist.apply

 USING FOR ###

dist.for - function(data) {

  d - matrix(0,nrow=nrow(data),ncol=nrow(data))
  n - ncol(data)
  r - nrow(data)

  for(i in 1:r) {
 for(j in 1:r) {
d[i,j] - sum(abs(data[i,]-data[j,]))/n
 }
  }

  return(as.dist(d))
}

 USING APPLY ###

f - function(data.row,data.rest) {

  r2 - as.double(apply(data.rest,1,g,data.row))

}

g - function(row2,row1) {
  return(sum(abs(row1-row2))/length(row1))
}

dist.apply - function(data) {
  d - apply(data,1,f,data)

  return(as.dist(d))
}


 TESTING ###

library(mvtnorm)
data - rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10))

tf - system.time(df - dist.for(data))
ta - system.time(da - dist.apply(data))

print(paste('diff = ',sum(as.matrix(df) - as.matrix(da
print(tf = )
print(tf)
print(ta = )
print(ta)

print('--')
print('Same experiment on data.frame...')
data2 - as.data.frame(data)

tf - system.time(df - dist.for(data2))
ta - system.time(da - dist.apply(data2))

print(paste('diff = ',sum(as.matrix(df) - as.matrix(da
print(tf = )
print(tf)
print(ta = )
print(ta)



Here is the output I get on my system (R version 2.7.1 on a Debian lenny)

[1] diff =  0
[1] tf = 
   user  system elapsed
  0.088   0.000   0.087
[1] ta = 
   user  system elapsed
  0.128   0.000   0.128
[1] --
[1] Same experiment on data.frame...
[1] diff =  0
[1] tf = 
   user  system elapsed
 35.031   0.000  35.029
[1] ta = 
   user  system elapsed
  0.184   0.000   0.185

Could you explain why that happens?

thank you,
regards

Roberto

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating Random Effects Coefficients from lmer

2009-10-21 Thread Matthew Schneider
Hello all,

I am new to the list serve but hope to contribute what I can. For now,
however, I hope to tap into all your knowledge about mixed effects models.

Currently, I am running a mixed effects model on a time series panel data.
For this model I want to find out the fixed and random effect for two
discrete variables.

My Model:

m1 - lmer(mghegdp_who ~ govdisgdp_up_net + pridisgdp_up_net +
mgdppc_usd06_imf + drdisgdp + mggegdpwb + hiv_prevalence + (0 +
govdisgdp_up_net|country) + (0 + pridisgdp_up_net|country), data)

To find the overall effect *with confidence intervals* of govdisgdp_up_net 
pridisgdp_up_net I need to account for the fixed and random effects.
Has any calculated this? Or does anyone have suggestions?

Also, does anyone know of the ability to calculate the postVar for a model
with multiple random effects?

Thank you and look forward to hearing your insights.
Matt 'Lost in Seattle'

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error: NA/NaN/Inf in foreign function call (arg 1)

2009-10-21 Thread Rainier Sabangan
Hi,

Below is the code that is giving me this error message:

#===
#The code below results in error: Error: NA/NaN/Inf in foreign function #call 
(arg 1)
#===
A=rep(c(-1,1),8);B=rep(c(-1,-1,1,1),4);C=rep(c(-1,-1,-1,-1,1,1,1,1),2)
D=c(rep(-1,8),rep(1,8))
y=c(38.01697,42.04596,43.89677,47.78144,37.34869,41.90744,44.01468,47.96458,37.97263,41.83482,44.01936,48.12203,38.73267,42.92054,44.16163,48.88900)
bcols=c(1,2)
dcols=c(3,4)
locmodel=glm(y~A+B)
mlcoefs=locmodel$coef
res=locmodel$residuals
ressq=res^2
dispmodel=glm(ressq~C+D,family=Gamma(link = log))

#===
#The code below results in error message: 
# Warning message: In glm.fit(x = X, y = Y, weights = weights, start = 
# start, etastart = #etastart,  : algorithm did not converge
#===
A=rep(c(-1,1),8)
B=rep(c(-1,-1,1,1),4)
C=rep(c(-1,-1,-1,-1,1,1,1,1),2)
D=c(rep(-1,8),rep(1,8))
y=c(37.93499,42.19333,44.02388,48.04979,37.52459,41.90613,43.44424,47.81314,
 38.04942,42.06124,44.00249,47.89033,39.00372,40.47911,43.78126,48.55228)
locmodel=glm(y~A+B)
mlcoefs=locmodel$coef
res=locmodel$residuals
ressq=res^2
dispmodel=glm(ressq~C+D,family=Gamma(link = log))
#===

I researched on this error and it can be caused by missing values, which is not 
the case for this example. 
I would really appreciate any help that can be extended.

Rainier Sabangan
BGSU MSAS Student

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simulating AR() using a

2009-10-21 Thread CJ Rubio

good day everyone!

i have a time series (andong.ts) and fitted and AR() model using the
following code

andong.ts - ts(read.table(D:/.../andong.csv, header = TRUE), start =
c(1966,1), frequency = 1)

ar(andong.ts)

Call:
ar(x = andong)

Coefficients:
 1   2   3  
0.3117  0.0607  0.0999  

Order selected 3  sigma^2 estimated as  0.8443 


I am aware that my model is now 

 x(t) = 0.3117x(t-1) + 0.0607x(t-2) + 0.0999x(t-3) + e(t)   
eqn(1)



but i am not sure of how to perform my simulation for the new series. i
tried to look at arima.sim, 

arima.sim(ar = c(0.3117,0.0607, 0.0999), n = 36)



but it doesn't seem to use the old series that should be simulated (or maybe
it is but i am not aware)

Could anyone please brief me on how to perform my simulation properly which
would perform eqn (1) with e(t) being the residual series? 

thanks in advance!!!

-- 
View this message in context: 
http://www.nabble.com/simulating-AR%28%29-using-a-tp26002850p26002850.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help in simulating AR models

2009-10-21 Thread CJ Rubio

good day everyone!

i have a time series (andong.ts) and fitted and AR() model using the
following code 

andong.ts - ts(read.table(D:/.../andong.csv, header = TRUE), start =
c(1966,1), frequency = 1) 

ar(andong.ts) 

Call: ar(x = andong)

Coefficients:
1 2   3
 0.3117  0.06070.0999 

Order selected 3 sigma^2 estimated as 0.8443



 I am aware that my model is now

 x(t) = 0.3117x(t-1) + 0.0607x(t-2) + 0.0999x(t-3) + e(t)   
  
eqn(1)




but i am not sure of how to perform my simulation for the new series. i
tried to look at arima.sim, 
arima.sim(ar = c(0.3117,0.0607, 0.0999), n = 36) 


but it doesn't seem to use the old series that should be simulated (or maybe
it is but i am not aware) Could anyone please brief me on how to perform my
simulation properly which would perform eqn (1) with e(t) being the residual
series?


 thanks in advance!!!

-- 
View this message in context: 
http://www.nabble.com/help-in-simulating-AR-models-tp26003111p26003111.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on mixed effect models with LME

2009-10-21 Thread Kingsford Jones
On Wed, Oct 21, 2009 at 11:06 AM, Peter Flom
peterflomconsult...@mindspring.com wrote:
...
 I have a longitudinal data set, with data on schools and their test scores 
 over a four year period.  I have centered year, and run the following


 m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
               data = long,
               random = ~I(year-2007.5)|schoolnum,
               na.action = na.omit)

 where math_1 is a percentage of students in a given school that are at the 
 lowest math achievement level, year is year, TFC_ is a categorical variable 
 for a treatment I wish to evaluate, and schoolnum is an identifier.

 When I run summary on this model, I get a strong negative correlation (-0.91) 
 between the intercept and I(year-2007.5), despite the fact that the mean of 
 year is 2007.5.


Hi Peter,

For the what's going on here? questions it's very helpful to have a
reproducible example.  I tried to create data fitting your
description, but the correlation disappeared as expected:

set.seed(777)
library(nlme)
school - factor(rep(1:20, each=4))
year - rep(2006:2009, 20)
year.c - year - mean(year)
tmt - sample(0:1, 20, replace = TRUE)[school]
math - rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school]
tmt - factor(tmt)
dfr - data.frame(math, school, tmt, year, year.c)
rm(math, school, year, tmt)
f1 - lme(math ~ year*tmt, data = dfr, random=~1|school)
f2 - update(f1, . ~ year.c*tmt)
summary(f1)$corFixed['year', '(Intercept)']
# [1] -0.997
summary(f2)$corFixed['year.c', '(Intercept)']
# [1] 0


A possibility is that the data are not of the expected classes.  What
does str(long) report?

hth,
Kingsford Jones








 I am puzzled, as I thought centering the time variable should eliminate, or 
 at least strongly reduce, this correlation.

 Any insights appreciated

 thanks

 Peter


 Peter L. Flom, PhD
 Statistical Consultant
 Website: www DOT peterflomconsulting DOT com
 Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
 Twitter:   @peterflom

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plot log scale

2009-10-21 Thread R_help Help
Hi - a simple question. I know that on plot one can set log=xy to
set scale on both axes to log scale. I use this to plot
autocorrelation of a slow decaying autocorrelated process. It has some
points that are zero or slightly negative and cause this plot to
return error. I'm wondering if there is anyway to force omitting these
points? Thank you.

rc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing data and LME models and diagnostic plots

2009-10-21 Thread Kingsford Jones
Mixed models based on likelihood methods can often handle missing
observations within subjects, but they not do well with missing
individual elements in the design matrices (think unit nonresponse vs
item nonresponse in the survey world).  Continuing with the example I
recently sent to you

set.seed(0112358)
library(nlme)
school - factor(rep(1:20, each=4))
year - rep(2006:2009, 20)
year.c - year - mean(year)
tmt - sample(0:1, 20, replace = TRUE)[school]
math - rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school]
tmt - factor(tmt)
dfr - data.frame(math, school, tmt, year, year.c)
rm(math, school, year, year.c, tmt)
# Now create missing observations,
# -- note just a single obs for the first school:
dfr2 - dfr[-c(2:6, sample(75, 5)), ]
table(dfr2$school)
#
# school:  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
# numobs:  1  2  4  3  4  4  4  4  4  3  3  4  4  4  3  4  4  4  4  4
f3 - lme(math ~ year.c*tmt, data = dfr2, random=~1|school)
intervals(f3)
#Approximate 95% confidence intervals
#
# Fixed effects:
# lower   est. upper
#(Intercept)  3.0705975 3.81227729 4.5539571
#year.c  -0.3913985 0.09107008 0.5735386
#tmt1 0.3146311 1.34895504 2.3832790
#year.c:tmt1 -0.4385680 0.20748194 0.8535319
#attr(,label)
#[1] Fixed effects:
#
# Random Effects:
#  Level: school
#  lower  est.upper
#sd((Intercept)) 0.35068 0.7334782 1.534134
#
# Within-group standard error:
#   lower est.upper
#1.226240 1.492044 1.815464


Note that even with the missing obs, in this instance we got pretty
good estimates of the variance components (parameters: within-group sd
= 1.5; between-school sd = 1).  However, I had to change the seed from
the one I used in the last email because, as is not uncommon with
unbalanced data and relatively small sample sizes, there were
convergence problems.  If you were using classical MoM estimators
fitting a mixed model with unbalanced data would lead to very
precarious inferences (not that the likelihood inference isn't without
it share of pitfalls, but that's another email...)

hth,

Kingsford





On Wed, Oct 21, 2009 at 11:13 AM, Peter Flom
peterflomconsult...@mindspring.com wrote:
 Hello

 Running R2.9.2 on Windows XP

 I am puzzled by the performance of LME in situations where there are missing 
 data.  As I understand it, one of the strengths of this sort of model is how 
 well it deals with missing data, yet lme requires nonmissing data.

 Thus,

 m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
                data = long,
                random = ~I(year-2007.5)|schoolnum)

 causes an error in na.fail.default, but adding na.action = na.omit makes a 
 model with no errors.  However, if I create that model, i.e.

 m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_,
               data = long,
               random = ~I(year-2007.5)|schoolnum,
               na.action = na.omit)

 then the diagnostic plots suggested in Pinheiro  Bates produce errors; e.g.

 plot(m1.mod1, schoolnum~resid(.), abline = 0)

 gives an error could not find function NaAct.


 Searching the archives showed a similar question from 2007, but did not show 
 any responses.

 Thanks for any help

 Peter

 )

 Peter L. Flom, PhD
 Statistical Consultant
 Website: www DOT peterflomconsulting DOT com
 Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
 Twitter:   @peterflom

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop vs. apply(): strange behavior with data frame?

2009-10-21 Thread Jim Holtman
try running Rprof on the two examples to see what the difference is.  
what you will probably see is a lot of the time on the dataframe is  
spent in accessing it like a matrix ('['). Rprof is very helpful to  
see where time is spent in your scripts.


Sent from my iPhone

On Oct 21, 2009, at 17:17, Roberto Perdisci  
roberto.perdi...@gmail.com wrote:



Hi everybody,
 I noticed a strange behavior when using loops versus apply() on a  
data frame.

The example below explicitly computes a distance matrix given a
dataset. When the dataset is a matrix, everything works fine. But when
the dataset is a data.frame, the dist.for function written using
nested loops will take a lot longer than the dist.apply

 USING FOR ###

dist.for - function(data) {

 d - matrix(0,nrow=nrow(data),ncol=nrow(data))
 n - ncol(data)
 r - nrow(data)

 for(i in 1:r) {
for(j in 1:r) {
   d[i,j] - sum(abs(data[i,]-data[j,]))/n
}
 }

 return(as.dist(d))
}

 USING APPLY ###

f - function(data.row,data.rest) {

 r2 - as.double(apply(data.rest,1,g,data.row))

}

g - function(row2,row1) {
 return(sum(abs(row1-row2))/length(row1))
}

dist.apply - function(data) {
 d - apply(data,1,f,data)

 return(as.dist(d))
}


 TESTING ###

library(mvtnorm)
data - rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10))

tf - system.time(df - dist.for(data))
ta - system.time(da - dist.apply(data))

print(paste('diff = ',sum(as.matrix(df) - as.matrix(da
print(tf = )
print(tf)
print(ta = )
print(ta)

print('--')
print('Same experiment on data.frame...')
data2 - as.data.frame(data)

tf - system.time(df - dist.for(data2))
ta - system.time(da - dist.apply(data2))

print(paste('diff = ',sum(as.matrix(df) - as.matrix(da
print(tf = )
print(tf)
print(ta = )
print(ta)



Here is the output I get on my system (R version 2.7.1 on a Debian  
lenny)


[1] diff =  0
[1] tf = 
  user  system elapsed
 0.088   0.000   0.087
[1] ta = 
  user  system elapsed
 0.128   0.000   0.128
[1] --
[1] Same experiment on data.frame...
[1] diff =  0
[1] tf = 
  user  system elapsed
35.031   0.000  35.029
[1] ta = 
  user  system elapsed
 0.184   0.000   0.185

Could you explain why that happens?

thank you,
regards

Roberto

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Missing data and LME models and diagnostic plots

2009-10-21 Thread Mark Difford

Hi Peter,

 See e.g. Hedeker and Gibbons, Longitudinal Data Analysis, which
 repeatedly stresses that 
 mixed models provide good estimates if the data are missing at random.

This may be true. However, one of the real strengths of LME is that it
handles unbalanced designs, which is a different thing. The fact that it
also gets good estimates when data are missing is a bonus.

Regards, Mark.



Peter Flom-2 wrote:
 
 I wrote
 
 I am puzzled by the performance of LME in situations where there are
 missing data.  As I 
 understand it, one of the strengths of this sort of model is how well
 it
 deals with missing 
 data, yet lme requires nonmissing data. 

 
 Mark Difford replied
 
You are confusing missing data with an unbalanced design. A strengths of
LME
is that it copes with the latter, which aov() does not.

 
 
 Thanks for your reply, but I don't believe I am confused with respect to
 missing data in mixed models.  See e.g. Hedeker and Gibbons, Longitudinal
 Data Analysis, which repeatedly stresses that mixed models provide good
 estimates if the data are missing at random.
 
 Regards
 
 Peter
 
 
 
 Peter L. Flom, PhD
 Statistical Consultant
 Website: www DOT peterflomconsulting DOT com
 Writing; http://www.associatedcontent.com/user/582880/peter_flom.html
 Twitter:   @peterflom
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/Missing-data-and-LME-models-and-diagnostic-plots-tp25996584p26004465.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.