date:20101011

Re: [R] MATLAB vrs. R

2010-10-11 Thread Peter Dalgaard

On 10/11/2010 07:46 AM, Craig O'Connell wrote:
 
 I need to find the area under a trapezoid for a research-related project.  I 
 was able to find the area under the trapezoid in MATLAB using the code:
 function [int] = myquadrature(f,a,b) 
 % user-defined quadrature function
 % integrate data f from x=a to x=b assuming f is equally spaced over the 
 interval 
 % use type 
 % determine number of data points
 npts = prod(size(f));
 nint = npts -1; %number of intervals
 if(npts =1)
   error('need at least two points to integrate')
 end;
 % set the grid spacing
 if(b =a)
   error('something wrong with the interval, b should be greater than a')
 else
   dx = b/real(nint);
 end;
 npts = prod(size(f));
 
 % trapezoidal rule
 % can code in line, hint:  sum of f is sum(f) 
 % last value of f is f(end), first value is f(1)
 % code below
 int=0;
 for i=1:(nint)
 %F(i)=dx*((f(i)+f(i+1))/2);
 int=int+dx*((f(i)+f(i+1))/2);
 end
 %int=sum(F);
 
 Then to call myquadrature I did:
 % example function call test the user-defined myquadrature function
 % setup some data
 
 % velocity profile across a channel
 % remember to use ? for help, e.g. ?seq 
 x = 0:10:2000; 
 % you can access one element of a list of values using brackets
 % x(1) is the first x value, x(2), the 2nd, etc.
 % if you want the last value, a trick is x(end)  
 % the function cos is cosin and mean gives the mean value
 % pi is 3.1415, or pi
 % another hint, if you want to multiple two series of numbers together
 % for example c = a*b where c(1) = a(1)*b(1), c(2) = a(2)*b(2), etc.
 % you must tell Matlab you want element by element multiplication
 % e.g.:c = a.*b
 % note the .
 %
 h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1); %bathymetry
 u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1);  %vertically-averaged 
 cross-transect velocity
 plot(x,-h)
 % set begin and end points for the integration 
 a = x(1);
 b = x(end);
 % call your quadrature function.  Hint, the answer should be 3.
 f=u.*h;
 val =  myquadrature(f,a,b);
 fprintf('the solution is %f\n',val);
 
 This is great, I got the expected answer of 3.
  
 NOW THE ISSUE IS, I HAVE NO IDEA HOW THIS CODE TRANSLATES TO R.  Here is what 
 I attempted to do, and with error messages, I can tell i'm doing something 
 wrong:
   myquadrature-function(f,a,b){
 npts=length(f)
 nint=npts-1
 if(npts=1)
 error('need at least two points to integrate')
 end;
 if(b=a)
 error('something wrong with the interval, b should be greater than a')
 else
 dx=b/real(nint)
 end;
 npts=length(f)
 _(below this line, I cannot code)
 int=0
 for(i in 1:(npts-1))
 sum(f)=((b-a)/(2*length(f)))*(0.5*f[i]+f[i+1]+f[length(f)])}
 %F(i)=dx*((f(i)+f(i+1))/2);
 int=int+dx*((f(i)+f(i+1))/2);
 end
 %int=sum(F);
  

For a literal translation, just pay a little more attention to detail:

for(i in 1:(npts-1))
int - int+dx*(f[1]+f[i+1])/2

However, a more R-ish way is to drop the loop and vectorize:

int - sum(f[-npts]+f[-1])/2*dx

(or int - sum(f) - (f[1]+f[npts])/2, by a well-known rewrite of the
trapezoidal rule).


 Thank you and any potential suggestions would be greatly appreciated.
  
 Dr. Argese. 
   [[alternative HTML version deleted]]



-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MATLAB vrs. R

2010-10-11 Thread Craig O'Connell


Thank you Peter.  That is very much helpful.  If you don't mind, I continued 
running the code to attempt to get my answer and I continue to get inf inf 
inf... (printed around 100 times).
 
Any assistance with this issue.  Here is my code (including your corrections):
 

myquadrature-function(f,a,b){
npts=length(f)
nint=npts-1
if(npts=1)
error('need at least two points to integrate')
end;
if(b=a)
error('something wrong with the interval, b should be greater than a')
else
dx=b/real(nint)
end;
npts=length(f)
int=0
int - sum(f[-npts]+f[-1])/2*dx
}
 
#Call my quadrature
x=seq(0,2000,10)
h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1)
u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1)
a = x[1]
b = x[length(x)] 
plot(x,-h)
a = x[1];
b = x[length(x)];
#call your quadrature function.  Hint, the answer should be 3.
f=u*h;
val =  myquadrature(f,a,b); ?  ___This is where issue arises.  
result=myquadrature(val,0,2000)  ?
print(result)   ?  
 
 
Thanks again,
 
Phil
 
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MATLAB vrs. R

2010-10-11 Thread Alain Guillet


 Hi,

The first argument of myquadrature in result shouldn't be val but f I 
guess. At least it works for me


 result=myquadrature(f,0,2000)
 print(result)
[1] 3

Regards,
Alain


On 11-Oct-10 09:37, Craig O'Connell wrote:

Thank you Peter.  That is very much helpful.  If you don't mind, I continued 
running the code to attempt to get my answer and I continue to get inf inf 
inf... (printed around 100 times).

Any assistance with this issue.  Here is my code (including your corrections):


myquadrature-function(f,a,b){
npts=length(f)
nint=npts-1
if(npts=1)
error('need at least two points to integrate')
end;
if(b=a)
error('something wrong with the interval, b should be greater than a')
else
dx=b/real(nint)
end;
npts=length(f)
int=0
int- sum(f[-npts]+f[-1])/2*dx
}

#Call my quadrature
x=seq(0,2000,10)
h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1)
u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1)
a = x[1]
b = x[length(x)]
plot(x,-h)
a = x[1];
b = x[length(x)];
#call your quadrature function.  Hint, the answer should be 3.
f=u*h;
val =  myquadrature(f,a,b); ?  ___This is where issue arises.
result=myquadrature(val,0,2000)  ?
print(result)   ?


Thanks again,

Phil



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
Bureau c.316
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] mvtnorm and noncentrality parameters

2010-10-11 Thread Paul Sweeting


 Hi

I'm trying to calculate densities for the multivariate noncentral t 
distribution.  For the avoidance of doubt, and becuase there do seem to 
be at least two definitions for a noncentral t distribution, This is a 
noncentral t distribution of the sort described in McNeill Frey and 
Embrechts (2005) and elsewhere that can be constructed as a normal 
mean-variance mixture. It is not simply a shifted t distribution. The 
univariate noncentral t distribution density can be calculated using the 
function dt in the TDist package. The univariate version seems to be do 
what I expect, i.e. give a distribution with a different shape.


The obvious approach for calculating a multivarate result seems to be to 
use dmvt in the mvtnorm package.  However, the noncentrality parameters 
here appear actually to be means.  In other words, they seem to simply 
shift the distribution rather than calculating a noncentral mutivariate 
distribution as described by McNeill et al.  So, a few questions:


1. Is my understanding of what dmvt is doing correct?
2. Is this what it is supposed to be doing?
3. Is there any way of deriving the densities that I want in R (short of 
writing my own function...)?


Thanks

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Number of occurences of a character in a string

2010-10-11 Thread Ted Harding

How about:

  sum(unlist(strsplit(b,NULL))==;)
  # [1] 5

(More transparent, at least to me ... ). See '?strsplit',
and note what is said under Value.

Ted.

On 11-Oct-10 04:35:43, Michael Sumner wrote:
 Literally:
 
 length( gregexpr(;, b)[[1]])
 
 But more generally, in case b has more than one element:
 
 sapply(gregexpr(;, b), length)
 
 ?gregexpr
 
 
 
 On Mon, Oct 11, 2010 at 3:18 PM, Santosh Srinivas 
 santosh.srini...@gmail.com wrote:
 
 New to R ... which is a function to most effectively search the number
 of
 occurrences of a character in a string?

 b -
 c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k
 )

 I want the number of semi-colons ; in b?
 Thanks.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 11-Oct-10   Time: 09:20:36
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dataset Transformation

2010-10-11 Thread Santosh Srinivas

I need to transpose the following input dataset  into an output dataset like
below

 

Input

 


Date

TICKER

Price


11/10/2010

A

0.991642


11/10/2010

B

0.475023


11/10/2010

C

0.218642


11/10/2010

D

0.365135


12/10/2010

A

0.687873


12/10/2010

B

0.47006


12/10/2010

C

0.533542


12/10/2010

D

0.812439


13/10/2010

A

0.210848


13/10/2010

B

0.699799


13/10/2010

C

0.546003


13/10/2010

D

0.152316

 

Output needed 

 


Date

A

B

C

D


11/10/2010

0.991642

0.475023

0.218642

0.365135


12/10/2010

0.687873

0.47006

0.533542

0.812439


13/10/2010

0.210848

0.699799

0.546003

0.152316

 

I tried using the aggregate function but not quite getting the method.

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Split rows depending on time frame

2010-10-11 Thread Bert Jacobs

Hi,

 

I have the following data frame, where col2 is a startdate and col3 an
enddate

 

COL1  COL2  COL3

A 4046240482

B 4046240478

 

The above timeframe of 3 weeks I would like to splits it in weeks like this

COL1  COL2  COL3  COL4

A 40462404681

A 40469404751

A 40476404821

B 40462404681

B 40469404751

B 40476404780.428

 

Where COL4 is an identifier if the timeframe between COL2 and COL3 is
exactly 7 days or shorter. 

In the example above for B the last split contains only 3 days so the value
in COL 4 is 3/7

 

I can't figure out to do the above. Is there someone who can help me out? 

 

Thx in advance,

Bert


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split rows depending on time frame

2010-10-11 Thread ONKELINX, Thierry

Dear Bert,

Use the plyr package to do the magic

library(plyr)
dataset - data.frame(COL1 = c(A, B), COL2 = 40462, COL3 = c(40482,
40478))

tmp - ddply(dataset, COL1, function(x){
delta - with(x, 1 + COL3 - COL2)
rows - rep(1, delta %/% 7)
if(delta %% 7  0){
rows - c(rows, (delta %% 7) / 7)
}
data.frame(COL4 = rows)
})
merge(dataset, tmp)

HTH,

Thierry


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens Bert Jacobs
 Verzonden: maandag 11 oktober 2010 11:26
 Aan: r-help@r-project.org
 Onderwerp: [R] Split rows depending on time frame
 
 Hi,
 
  
 
 I have the following data frame, where col2 is a startdate 
 and col3 an enddate
 
  
 
 COL1  COL2  COL3
 
 A 4046240482
 
 B 4046240478
 
  
 
 The above timeframe of 3 weeks I would like to splits it in 
 weeks like this
 
 COL1  COL2  COL3  COL4
 
 A 40462404681
 
 A 40469404751
 
 A 40476404821
 
 B 40462404681
 
 B 40469404751
 
 B 40476404780.428
 
  
 
 Where COL4 is an identifier if the timeframe between COL2 and 
 COL3 is exactly 7 days or shorter. 
 
 In the example above for B the last split contains only 3 
 days so the value in COL 4 is 3/7
 
  
 
 I can't figure out to do the above. Is there someone who can 
 help me out? 
 
  
 
 Thx in advance,
 
 Bert
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Split rows depending on time frame

2010-10-11 Thread Gabor Grothendieck

On Mon, Oct 11, 2010 at 5:25 AM, Bert Jacobs
bert.jac...@figurestofacts.be wrote:
 Hi,



 I have the following data frame, where col2 is a startdate and col3 an
 enddate



 COL1      COL2      COL3

 A             40462    40482

 B             40462    40478



 The above timeframe of 3 weeks I would like to splits it in weeks like this

 COL1      COL2      COL3      COL4

 A             40462    40468    1

 A             40469    40475    1

 A             40476    40482    1

 B             40462    40468    1

 B             40469    40475    1

 B             40476    40478    0.428



 Where COL4 is an identifier if the timeframe between COL2 and COL3 is
 exactly 7 days or shorter.

 In the example above for B the last split contains only 3 days so the value
 in COL 4 is 3/7

Try this:

DF - data.frame(COL1 = c(A, B), COL2 = 40462, COL3 = c(40482, 40478))

do.call(rbind, by(DF, DF$COL1, function(x) with(x, {
COL2 - seq(COL2, COL3, 7)
COL3 - pmin(COL2 + 6, COL3)
COL4 - (COL3 - COL2 + 1) / 7
data.frame(COL1, COL2, COL3, COL4)
})))

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotting Zipf and Zipf-Mandelbrot curves in R

2010-10-11 Thread Joseph Sorell

Using R, I plotted a log-log plot of the frequencies in the Brown Corpus
using
plot(sort(file.tfl$f, decreasing=TRUE), xlab=rank, ylab=frequency,
log=x,y)
However, I would also like to add lines showing the curves for a Zipfian
distribution and for Zipf-Mandelbrot.
I have seen these in many articles that used R in creating graphs.
Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] clustering with cosine correlation

2010-10-11 Thread l.mohammadikhankahdani


Dear All

 Do you know how to make a heatmap and use cosine correlation for 
clustering? This is what my colleague can do in gene-math and I want to 
do in R but I don't know how to.

Thanks a lot
Leila

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mapping the coordinates!

2010-10-11 Thread Paul Hiemstra


Hi Mehdi,

Take a look at the spatial task view [1] and the r-sig-geo mailing [2] list.

cheers,
Paul

[1] http://cran.r-project.org/web/views/Spatial.html
[2] https://stat.ethz.ch/mailman/listinfo/r-sig-geo

On 10/10/2010 06:11 AM, Mehdi Zarrei wrote:










Hello,



I have a series of coordinates
(latitudes and longitudes) each one/several associated to a code
(from 1 to 28). I used function points (latitude, longitudes) to
transfer them to a per-prepared map.




1- I wonder how  I might be able to
automatically add codes (1-28) to the map too?




2-Moreover,  mostly there are a few
codes from the identical coordinates. What is the function to avoid
overlapping of codes on the map?




3- I want to draw closed line around
some geographical areas to define the habitats.







Your help in any way (introducing
manuals, codes, etc) is appreciated.






All the best,



Mehdi




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   



--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 253 5773
http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Boundary correction for kernel density estimation

2010-10-11 Thread Katja Hebestreit

Dear R-users,

I have the following problem:

I would like to estimate the density curve for univariate data between 0
and 1. Unfortunately, the density function in the stats package is just
capable to cut the curve at a left and a right-most point. This
truncation would lead to an underestimation. An overspill of the bounded
support is unappropriate as well.

Do anyone knows a boundary correction method implmented in R? I did much
research but the correction methods I found are regarding survival or
spatial data.

Thanks a lot for any hint!
Cheers,
Katja

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (senza oggetto)

2010-10-11 Thread Barbara . Rogo


I have the y-axe in a grafich that has as extreme limit 0.00 and 1.50. plot 
gives me the interval 0.0, 0.5,1.0,1.5 but I want:
0.00,0.15,0.30 and so on with 2 decimals. How can I do? Thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (senza oggetto)

2010-10-11 Thread Ivan Calandra


 Hi,

Try this:
plot(seq(0,1.5,0.1), yaxt=n)
axis(2, at=seq(0.00,1.50, 0.15))

To understand read ?par (especially the yaxt argument in that case, but 
I guess you need to know more about that) and ?axis


HTH,
Ivan

Le 10/11/2010 14:07, barbara.r...@uniroma1.it a écrit :

I have the y-axe in a grafich that has as extreme limit 0.00 and 1.50. plot 
gives me the interval 0.0, 0.5,1.0,1.5 but I want:
0.00,0.15,0.30 and so on with 2 decimals. How can I do? Thanks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] conditioning variables in dbRDA

2010-10-11 Thread Nevil Amos

 I am using cascaple() in vegan, is it permissible to have more than 
one conditioning variable thus
capscale(DIST~varaible1+variable2+Conditon(varaible3+variable4), 
data=mydata)


many thanks

Nevil Amos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (senza oggetto)

2010-10-11 Thread Ted Harding

On 11-Oct-10 12:07:43, barbara.r...@uniroma1.it wrote:
 
 I have the y-axe in a grafich that has as extreme limit 0.00 and 1.50.
 plot gives me the interval 0.0, 0.5,1.0,1.5 but I want:
 0.00,0.15,0.30 and so on with 2 decimals. How can I do? Thanks
   [[alternative HTML version deleted]]
 __

The key to this is the plot() parameter yaxp [see below].

So, for instance,

x - (0:5)
y - 1.5*(x^2)/(5^2) ## y-values range from 0 to 1.5
plot(x, y, ylim=c(0,1.5), yaxp=c(0,1.5,10))

## And compare with:
plot(x, y, ylim=c(0,1.5))


The explanation of 'yaxp' (and its x-friend xaxp) can be
found is '?par'. The full details are under 'xaxp':

'xaxp' A vector of the form 'c(x1, x2, n)' giving the
   coordinates of the extreme tick marks and the
   number of intervals between tick-marks when
   'par(xlog)' is false. [...]

As explained under 'yaxp', this is constructed in the
same way as 'xaxp.

So you have y ranging from 0 to 1.5 by steps of 0.15,
hence a total of 10 intervals, therefore

  yaxp = c(0,1.5,10)

It is unfortunate that the documentation for the large
number of parameters and features for graphics, even
for the basic plot() function (which will be every
beginner's starting point) is fragmented over many
different documentation entries.

If you start with '?plot', not yet knowing where you
should be looking, it could take you several tries in
different places before you find what you want!

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 11-Oct-10   Time: 13:51:24
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditioning variables in dbRDA

2010-10-11 Thread Gavin Simpson

On Mon, 2010-10-11 at 23:42 +1100, Nevil Amos wrote:
 I am using cascaple() in vegan, is it permissible to have more than 
 one conditioning variable thus
 capscale(DIST~varaible1+variable2+Conditon(varaible3+variable4), 
 data=mydata)

Yes. Have you tried it and had problems doing it? Or is this just a
request for clarification?

It does work:
 data(varespec)
 capscale(varespec ~ Ca + Condition(pH + P), varechem)
Call: capscale(formula = varespec ~ Ca + Condition(pH + P), data =
varechem)

  Inertia Rank
Total1826 
Conditional   4282
Constrained   1301
Unconstrained1268   20
Inertia is mean squared Euclidean distance 

Eigenvalues for constrained axes:
 CAP1 
130.0 

Eigenvalues for unconstrained axes:
  MDS1   MDS2   MDS3   MDS4   MDS5   MDS6   MDS7   MDS8 
739.77 226.54  89.04  70.61  36.72  30.47  24.77  18.64 
(Showed only 8 of all 20 unconstrained eigenvalues)

HTH

G

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question related to multiple regression

2010-10-11 Thread Ben Bolker

SNN s.nancy1 at yahoo.com writes:

 I am conducting an association analysis of genotype and a phenotype such as
 cholesterol level as an outcome and the genotype as a regressor using
 multiple linear regression. There are 3 possibilities for the genotype AA,
 AG, GG. There are 5 people with the AA genotype, 100 with the AG genotype
 and 900 with the GG genotype. I coded GG genotype as 1, AG as 2 and AA as 3
 and the p-value for the genotype is significant. 
 Should I believe this p-value or not? My concern is that there are not may
 samples with the AA genotype and could that have effected the significance
 of the genotype in the model?

  Make sure that R is treating genotype as a factor, not a continuous
covariate -- for that reason it's better *not* to recode genotypes
as integer codes, which increases the chance of this type of confusion.
Unless you really have reason to believe that the difference in
expected cholesterol level is linearly related to the number of
A alleles -- i.e. 

b_0 for GG
b_0+d for AG
b_0+2*d for AA

this seems like a fairly strong assumption to make ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditioning variables in dbRDA

2010-10-11 Thread Jari Oksanen

On 11/10/10 15:42 PM, Nevil Amos nevil.a...@gmail.com wrote:

   I am using cascaple() in vegan, is it permissible to have more than
 one conditioning variable thus
 capscale(DIST~varaible1+variable2+Conditon(varaible3+variable4),
 data=mydata)
 
Nevil,

Yes, it is permissible.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataset Transformation

2010-10-11 Thread Santosh Srinivas

Repost .. since the previous msg had problems

I need to transpose the following input dataset  into an output dataset like
below

Input
DateTICKER  Price
11/10/2010  A   0.991642
11/10/2010  B   0.475023
11/10/2010  C   0.218642
11/10/2010  D   0.365135
12/10/2010  A   0.687873
12/10/2010  B   0.47006
12/10/2010  C   0.533542
12/10/2010  D   0.812439
13/10/2010  A   0.210848
13/10/2010  B   0.699799
13/10/2010  C   0.546003
13/10/2010  D   0.152316

Output needed 

DateA   B   C   D
11/10/2010  0.9916420.4750230.2186420.365135
12/10/2010  0.6878730.47006 0.5335420.812439
13/10/2010  0.2108480.6997990.5460030.152316

I tried using the aggregate function but not quite getting the method.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] venneuler (java?) color palette 0 - 1

2010-10-11 Thread Karl Brand


Hi Paul,

That's pretty much awesome. Thank you very much.

And combined with the colorspace package functions- rainbow_hcl() and 
sequential_hcl() -make color selection easy. One thing i was digging for 
was a function that yields a color palette *and* the hcl() call needed 
to produce it. This would help me better understand the hcl format. So 
where i can get the RGB codes like this-


 rainbow_hcl(4)
[1] #E495A5 #ABB065 #39BEB1 #ACA4E2


- which is fine for color specification, is there a palette function 
that might help obtain the hcl() call needed to produce a given palette? 
ie., the 'h', 'c' and 'l' (and 'alpha' if appropriate) values for a 
given color/shade??


Thanks again and in advance for any further pointers,

Karl

On 10/10/2010 10:41 PM, Paul Murrell wrote:

Hi

On 11/10/2010 9:01 a.m., Karl Brand wrote:

Dear UseRs and DevelopeRs

It would be helpful to see the color palette available in the
venneuler() function.

The relevant par of ?venneuler states:

colors: colors of the circles as values between 0 and 1

-which explains color specification, but from what pallette? Short of
trial and error, i'd really appreciate if some one could help me locate
a 0 - 1 pallette for this function to aid with color selection.


The color spec stored in the VennDiagram object is multiplied by 360 to
give the hue component of an hcl() colour specification. For example,
0.5 would mean the colour hcl(0.5*360, 130, 60)

Alternatively, you can control the colours when you call plot, for
example, ...

plot(ve, col=c(red, green, blue))

... should work.

Paul


FWIW, i tried the below code and received the displayed error. I failed
to turn up any solutions to this error...

Any suggestions appreciated,

Karl


library(venneuler)

ve- venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1))

class(ve)
[1] VennDiagram

ve$colors- c(red, green, blue)

plot(ve)

Error in col * 360 : non-numeric argument to binary operator





--
Karl Brand k.br...@erasmusmc.nl
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
P +31 (0)10 704 3409 | F +31 (0)10 704 4743 | M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataset Transformation

2010-10-11 Thread jim holtman

try this:

 x - read.table(textConnection(DateTICKER  Price
+ 11/10/2010  A   0.991642
+ 11/10/2010  B   0.475023
+ 11/10/2010  C   0.218642
+ 11/10/2010  D   0.365135
+ 12/10/2010  A   0.687873
+ 12/10/2010  B   0.47006
+ 12/10/2010  C   0.533542
+ 12/10/2010  D   0.812439
+ 13/10/2010  A   0.210848
+ 13/10/2010  B   0.699799
+ 13/10/2010  C   0.546003
+ 13/10/2010  D   0.152316), header = TRUE,
as.is = TRUE)
 closeAllConnections()
 x.m - melt(x)
Using Date, TICKER as id variables
 cast(x.m, Date ~ TICKER)
DateABCD
1 11/10/2010 0.991642 0.475023 0.218642 0.365135
2 12/10/2010 0.687873 0.470060 0.533542 0.812439
3 13/10/2010 0.210848 0.699799 0.546003 0.152316



On Mon, Oct 11, 2010 at 9:35 AM, Santosh Srinivas
santosh.srini...@gmail.com wrote:
 Repost .. since the previous msg had problems

 I need to transpose the following input dataset  into an output dataset like
 below

 Input
 Date            TICKER          Price
 11/10/2010              A               0.991642
 11/10/2010              B               0.475023
 11/10/2010              C               0.218642
 11/10/2010              D               0.365135
 12/10/2010              A               0.687873
 12/10/2010              B               0.47006
 12/10/2010              C               0.533542
 12/10/2010              D               0.812439
 13/10/2010              A               0.210848
 13/10/2010              B               0.699799
 13/10/2010              C               0.546003
 13/10/2010              D               0.152316

 Output needed

 Date    A       B       C       D
 11/10/2010      0.991642        0.475023        0.218642        0.365135
 12/10/2010      0.687873        0.47006 0.533542        0.812439
 13/10/2010      0.210848        0.699799        0.546003        0.152316

 I tried using the aggregate function but not quite getting the method.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Efficiency Question - Nested lapply or nested for loop

2010-10-11 Thread epowell


Thank you both for your advice.  I ended up implementing both solutions and
testing them on a real dataset of 10,000 rows and 50 inds.  The results are
very, very interesting.

For some context, the original two approaches, nested lapply and nested for
loops, performed at 1.501529 
and 1.458963 mins, respectively.  So the for loops were indeed a bit faster.  

Next, I tried the index solution to avoid doing the paste command each
iteration.  Strangely, this increased the time to 2.83 minutes.  Here's how
I implemented it:

# create array of column idx
v = vector(mode=character,length=nind*4)
for (i in (0:(nind-1))) {
  v[(i*4+1):(i*4+4)] = c(paste(G_hat_0_,i,sep=),
 paste(G_hat_1_,i,sep=),
paste(G_hat_2_,i,sep=),
 paste(G_,i,sep=))
}
v = match(v,names(data))

for (row in (1:nrow(data))) {
for (i in (0:(nind-1))) { 

Gmax = which.max(c( data[row,v[i*4+1]],
  data[row,v[i*4+2]],
  data[row,v[i*4+3]] ))

Gtru = data[row,v[i*4+4]] + 1   # add 1 to match Gmax range

cmat[Gmax,Gtru] = cmat[Gmax,Gtru] + 1
}
}

DAVID: Was this what you had in mind?  I had trouble implementing the vector
of indices as you had done.  It generated a bunch of warnings.

By far the best solution was that offered by Gabor.  His technique finished
the job in a whopping 9.8 SECONDS.  It took me about 15 minutes to
understand what it was doing, but the lesson is one I will never forget.  I
must admit, it was a wickedly clever solution.

I implemented it virtually identically to Gabor's example.  The only
difference is that I used the 'v' vector to subset the data frame because in
reality the data has many other unrelated columns.

mat - matrix(t(data[v]), 4)
table(Gmax = apply(mat[-4,], 2, which.max), Gtru = mat[4,] + 1)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Efficiency-Question-Nested-lapply-or-nested-for-loop-tp2968553p2989822.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] venneuler (java?) color palette 0 - 1

2010-10-11 Thread Achim Zeileis


On Mon, 11 Oct 2010, Karl Brand wrote:


Hi Paul,

That's pretty much awesome. Thank you very much.

And combined with the colorspace package functions- rainbow_hcl() and 
sequential_hcl() -make color selection easy. One thing i was digging for was 
a function that yields a color palette *and* the hcl() call needed to produce 
it. This would help me better understand the hcl format. So where i can get 
the RGB codes like this-



rainbow_hcl(4)

[1] #E495A5 #ABB065 #39BEB1 #ACA4E2




- which is fine for color specification, is there a palette function that 
might help obtain the hcl() call needed to produce a given palette? ie., the 
'h', 'c' and 'l' (and 'alpha' if appropriate) values for a given 
color/shade??


The ideas underlying rainbow_hcl(), sequential_hcl(), and diverge_hcl() 
are described in the following paper


  Achim Zeileis, Kurt Hornik, Paul Murrell (2009).
  Escaping RGBland: Selecting Colors for Statistical Graphics.
  Computational Statistics  Data Analysis, 53(9), 3259-3270.
  doi:10.1016/j.csda.2008.11.033

A preprint PDF version of it is also available for download on my webpage.

In the paper you see how the HCL coordinates for the different palettes 
are constructed. The functions rainbow_hcl(), sequential_hcl(), and 
diverge_hcl() are all direct translations of this, consisting just of a 
few lines of code.


What may be somewhat confusing is that the functions call
  hex(polarLUV(L, C, H, ...))
instead of
  hcl(H, C, L, ...)
which may yield slightly different results. The reason for this is that 
the polarLUV() implementation in colorspace predates the base R 
implementation in hcl().


hth,
Z


Thanks again and in advance for any further pointers,

Karl

On 10/10/2010 10:41 PM, Paul Murrell wrote:

Hi

On 11/10/2010 9:01 a.m., Karl Brand wrote:

Dear UseRs and DevelopeRs

It would be helpful to see the color palette available in the
venneuler() function.

The relevant par of ?venneuler states:

colors: colors of the circles as values between 0 and 1

-which explains color specification, but from what pallette? Short of
trial and error, i'd really appreciate if some one could help me locate
a 0 - 1 pallette for this function to aid with color selection.


The color spec stored in the VennDiagram object is multiplied by 360 to
give the hue component of an hcl() colour specification. For example,
0.5 would mean the colour hcl(0.5*360, 130, 60)

Alternatively, you can control the colours when you call plot, for
example, ...

plot(ve, col=c(red, green, blue))

... should work.

Paul


FWIW, i tried the below code and received the displayed error. I failed
to turn up any solutions to this error...

Any suggestions appreciated,

Karl


library(venneuler)

ve- venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1))

class(ve)
[1] VennDiagram

ve$colors- c(red, green, blue)

plot(ve)

Error in col * 360 : non-numeric argument to binary operator





--
Karl Brand k.br...@erasmusmc.nl
Department of Genetics
Erasmus MC
Dr Molewaterplein 50
3015 GE Rotterdam
P +31 (0)10 704 3409 | F +31 (0)10 704 4743 | M +31 (0)642 777 268

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] filled.contour: colour key decoupled from main plot?

2010-10-11 Thread Panos Hadjinicolaou

Dear R colleagues,

I am trying to plot some geophysical data as a filled contour on a continent 
map and so far the guidance from the R-help archives has been invaluable. The 
only bit that still eludes me is the colour key (legend) coming with 
filled.contour:

I prefer to generate my own colour palette, mainly based on the quantiles of 
tenths of the data in order to capture the whole range (of rainfall for 
example), including the more extreme values both sides. In the colour key this 
results in uneven distribution of the colour bars (and I understood why). Here 
is the code with simplistic data:

xlon - seq(10, 60, len=10) 
ylat - seq(20, 50, len=10) 
prcp - abs(rnorm(length(xlon)*length(ylat)))*1000
zprcp - array(zprcp,c(length(xlon),length(ylat)))

zprcp.colour 
-c(#EDFFD2,#00FFD2,#00F0FF,#00B4FF,#0078FF,#003CFF,#FF,#3C00FF,#7800FF,#B400FF,#FF0096)
zprcp.quants - 
rev(quantile(zprcp,na.rm=T,probs=c(1,0.98,0.9,0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1)))
zprcp.breaks -c(0,10*ceiling(zprcp.quants/10))

filled.contour(xlon,ylat,zprcp,ylim=c(20,50),xlim=c(10,60), asp=1.0, 
plot.axes=map('worldHires',xlim=c(10,60),ylim=c(20,50), border=0.9, add 
=TRUE),levels=zprcp.breaks,col=zprcp.colour,key.axes = axis(4,zprcp.breaks))

I would like the colour bars to be even (and the labels to represent the actual 
quantile values). 

I tried to modify the key.axes=axis(..) to force an evenly spaced colour key 
(and keeping the same colours) but it seems that this ultimately obeys the 
'levels' and 'col' parameters already defined, which are also used for the main 
image. I have also tried to decouple the 'levels' and 'col' settings between 
the main plot and the legend by fiddling with the filled.contour function but 
without success yet.

I would be grateful for any ideas, ideally based on the basic graphics package.

Thanks,

Panos



---
Dr Panos Hadjinicolaou

Energy Environment Water Research Center (EEWRC)
The Cyprus Institute

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dataset Transformation

2010-10-11 Thread Gabor Grothendieck

On Mon, Oct 11, 2010 at 9:35 AM, Santosh Srinivas
santosh.srini...@gmail.com wrote:
 Repost .. since the previous msg had problems

 I need to transpose the following input dataset  into an output dataset like
 below

 Input
 Date            TICKER          Price
 11/10/2010              A               0.991642
 11/10/2010              B               0.475023
 11/10/2010              C               0.218642
 11/10/2010              D               0.365135
 12/10/2010              A               0.687873
 12/10/2010              B               0.47006
 12/10/2010              C               0.533542
 12/10/2010              D               0.812439
 13/10/2010              A               0.210848
 13/10/2010              B               0.699799
 13/10/2010              C               0.546003
 13/10/2010              D               0.152316

 Output needed

 Date    A       B       C       D
 11/10/2010      0.991642        0.475023        0.218642        0.365135
 12/10/2010      0.687873        0.47006 0.533542        0.812439
 13/10/2010      0.210848        0.699799        0.546003        0.152316

 I tried using the aggregate function but not quite getting the method.



1. Try this:

Lines -  DateTICKER  Price
11/10/2010  A   0.991642
11/10/2010  B   0.475023
11/10/2010  C   0.218642
11/10/2010  D   0.365135
12/10/2010  A   0.687873
12/10/2010  B   0.47006
12/10/2010  C   0.533542
12/10/2010  D   0.812439
13/10/2010  A   0.210848
13/10/2010  B   0.699799
13/10/2010  C   0.546003
13/10/2010  D   0.152316

DF - read.table(textConnection(Lines), header = TRUE)
DF$Date - as.Date(DF$Date,%d/%m/%Y)
DFout - reshape(DF, dir = wide, timevar = TICKER, idvar = Date)
names(DFout) - sub(Price., , names(DFout))


2. or using read.zoo in the zoo package we can read it in and reshape
it all at once:

library(zoo)
z - read.zoo(textConnection(Lines), header = TRUE,
split = 2, format = %d/%m/%Y)

At this point z is a zoo object in wide format:

 z
  ABCD
2010-10-11 0.991642 0.475023 0.218642 0.365135
2010-10-12 0.687873 0.470060 0.533542 0.812439
2010-10-13 0.210848 0.699799 0.546003 0.152316

Since this is a multivariate time series you might want to just leave
it as a zoo object since you then get all of the facilities of zoo,
e.g.

plot(z) # multi-panel
plot(z, screen = 1) # all in one panel

but if you want it as a data frame then convert it like this:

 data.frame(Index = index(z), coredata(z))
   IndexABCD
1 2010-10-11 0.991642 0.475023 0.218642 0.365135
2 2010-10-12 0.687873 0.470060 0.533542 0.812439
3 2010-10-13 0.210848 0.699799 0.546003 0.152316

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Trouble accessing cov function from stats library

2010-10-11 Thread Barth B. Riley

Dear all

I am trying to use the cov function in the stats library. I have no problem 
using this function from the console. However, in my R script I received a 
function not found message. Then I called stats::cov(...) and received an 
error message that the function was not exported. Then I tried stats:::cov 
(three colons) and received the error

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
  object 'Cov' not found
I am also importing the ltm library, though I'm not aware of a cov function in 
ltm that could be causing a conflict. Any suggestions?

Thanks

Barth



PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help with simple but massive data transformation

2010-10-11 Thread clee


I have data that looks like this:

start end value
1  4 2
5  8 1  
9 100


I want to transform the data so that it becomes:

startend value
1   2
2   2
3   2
4   2
5   1
6   1
7   1
8   1
9   0
10 0


I've written a for loop that can do the transformation BUT I need to do this
on very large datasets (millions of rows).  Does anyone know of an R package
that has a function that can do this transformation?

Any help is much appreciated!

Thanks!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/help-with-simple-but-massive-data-transformation-tp2989850p2989850.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trouble accessing cov function from stats library

2010-10-11 Thread David Winsemius



On Oct 11, 2010, at 10:27 AM, Barth B. Riley wrote:


Dear all

I am trying to use the cov function in the stats library. I have no  
problem using this function from the console. However, in my R  
script I received a function not found message. Then I called  
stats::cov(...) and received an error message that the function was  
not exported. Then I tried stats:::cov (three colons) and received  
the error


Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
 object 'Cov' not found


You are misspelling it. R is case-sensitive.

I am also importing the ltm library, though I'm not aware of a cov  
function in ltm that could be causing a conflict. Any suggestions?


Thanks

Barth



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to get Mean rank for Kruskal-Wallis Test

2010-10-11 Thread Lawrence

Hello All,

I want Ranks' Table in R as like in SPSS ouput in the given link.

http://www.statisticssolutions.com/methods-chapter/statistical-tests/kruskal-wallis-test/

Is the code is already available? Please let me know.

Thanks,
Lawrence

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] OT: snow socket clusters with VirtualBox and VMware player (Linux host, Win guest)

2010-10-11 Thread Ramon Diaz-Uriarte

Dear All,


I am trying to create socket clusters (using snow and snowfall) with a
Windows OS. I am running Windows inside VirtualBox and VMware player
(i.e., Windows is guest) from a Debian Linux host system (I've tried
in two different Linux systems, an AMD x86-64 workstation and an Intel
i686 laptop).  However, almost always seting up the cluster fails:
either R will hang forever or I will get messages such as

in socketConnect [...] port 10187 cannot be opened

Error in sfInit [...]
Starting of snow cluster failed! Error in sockectConnection 


In fact, a command such as

socketConnection(port = 10187, server = TRUE)

hangs forever.

This happens with R-2.11.1 and the current R-devel. For both
VirtualBox and VMware Player I am running the latest available
versions.

As far as I can tell, there is no firewall in the Windows machines,
and the firewall in the Linux machines is definitely down now. From
time to time, the cluster gets created, but I have no idea why it
succeeds (as far as I can tell, there is nothing different).

I guess this is some sort of strange interaction between Windows and
the virtualization, and R probably has little to do in this. However,
has anybody been able to run snow and/or snowfall with socket clusters
in a similar setup?

Best,

R.

-- 
Ramon Diaz-Uriarte
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
Phone: +34-91-732-8000 ext. 3019
Fax: +-34-91-224-6972

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with simple but massive data transformation

2010-10-11 Thread ONKELINX, Thierry

This should be easy with apply()

do.call(rbind, apply(dataset, 1, function(x){
list(data.frame(startend = x[1]:x[2], value = x[3])
}))

Untested!



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens clee
 Verzonden: maandag 11 oktober 2010 16:17
 Aan: r-help@r-project.org
 Onderwerp: [R] help with simple but massive data transformation
 
 
 I have data that looks like this:
 
 start end value
 1  4 2
 5  8 1  
 9 100
 
 
 I want to transform the data so that it becomes:
 
 startend value
 1   2
 2   2
 3   2
 4   2
 5   1
 6   1
 7   1
 8   1
 9   0
 10 0
 
 
 I've written a for loop that can do the transformation BUT I 
 need to do this on very large datasets (millions of rows).  
 Does anyone know of an R package that has a function that can 
 do this transformation?
 
 Any help is much appreciated!
 
 Thanks!
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/help-with-simple-but-massive-dat
 a-transformation-tp2989850p2989850.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with simple but massive data transformation

2010-10-11 Thread Gabor Grothendieck

On Mon, Oct 11, 2010 at 10:16 AM, clee cheel...@gmail.com wrote:

 I have data that looks like this:

 start     end     value
 1          4         2
 5          8         1
 9         10        0


 I want to transform the data so that it becomes:

 startend     value
 1               2
 2               2
 3               2
 4               2
 5               1
 6               1
 7               1
 8               1
 9               0
 10             0

 I've written a for loop that can do the transformation BUT I need to do this
 on very large datasets (millions of rows).  Does anyone know of an R package
 that has a function that can do this transformation?

A very similar question was just asked recently. See this:

https://stat.ethz.ch/pipermail/r-help/2010-October/255791.html


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with simple but massive data transformation

2010-10-11 Thread David Winsemius



On Oct 11, 2010, at 10:16 AM, clee wrote:



I have data that looks like this:

start end value
1  4 2
5  8 1
9 100


I want to transform the data so that it becomes:

startend value
1   2
2   2
3   2
4   2
5   1
6   1
7   1
8   1
9   0
10 0


 do.call(rbind,
   apply(dta, 1,
 function(.r) matrix(c(
  seq(.r[1], .r[2]),
  vals=rep(.r[3], .r[2]-.r[1]+1) ),
  ncol=2) ))
  [,1] [,2]
 [1,]12
 [2,]22
 [3,]32
 [4,]42
 [5,]51
 [6,]61
 [7,]71
 [8,]81
 [9,]90
[10,]   100




I've written a for loop that can do the transformation BUT I need to  
do this
on very large datasets (millions of rows).  Does anyone know of an R  
package

that has a function that can do this transformation?

Any help is much appreciated!

Thanks!
--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-simple-but-massive-data-transformation-tp2989850p2989850.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Heatmap/Color Selection(Key)

2010-10-11 Thread Eik Vettorazzi

Hi Rashid,
you may have a look at the colorRampPalette-function, along with the
at argument oh heatmap.2

x-matrix(runif(100,-6,6),nrow=10)
heatmap.2(x,col=colorRampPalette(c(blue,lightblue,darkgray,darkgray,yellow,red),space=Lab),at=c(-6.01,6.01,51))
# or just using the colors you posted
heatmap.2(x,col=c(blue,lightblue,darkgray,black,darkgray,yellow,red),at=-3:3*2)

hth.

Am 08.10.2010 17:03, schrieb rashid kazmi:
 Hi
 I made heatmap of QTL based on Lod score. Where I have traits in columns and
 marker data (rows). I can not cluster both column and rows as I need the
 right order for marker data.
 
 
 Can someone suggest me better way of generating heatmaps especially the
 colour key I want to select to visualize the results which are more
 interesting to look at.
 
 
 
 library(gplots)
 
 sample=read.csv(file.choose())
 
 sample.names-sample[,1]
 
 sample.set-sample[,-1]
 
 sample.map - as.matrix(sample.set)
 
 
 
 ### have to order as i have markers on rows so just want denrogram on
 triats(column)
 
 
 
 ord - order(rowSums(abs(sample.map)),decreasing=T)
 
 
 
 heatmap.2(sample.map[ord,],Rowv=F,dendrogram=column,trace=none,col=greenred(10))
 
 
 
 
 
 
 But I want to give colours more specifically as I want to show the QTL
 hotspots starting as fallow
 
 
 
 1)  -6  to -4 (blue)
 
 2)   -4 to -2 (light blue)
 
 3)  -2 to  0  (dark grey or black)
 
 4)  0 to 2 (dark grey or black)
 
 5)  2 to 4 (yellow)
 
 6)  4 to 6 (red)
 
 
 
 Any help or some addition to the above mentioned R code would be
 appreciated.
 
 
 
 R Kazmi
 
 PhD
 
 The Netheralnd
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Heatmap/Color Selection(Key)

2010-10-11 Thread Eik Vettorazzi

sorry, typo:

 heatmap.2(x,col=colorRampPalette(c(blue,lightblue,darkgray,darkgray,yellow,red),space=Lab),at=c(-6.01,6.01,51))
 heatmap.2(x,col=c(blue,lightblue,darkgray,black,darkgray,yellow,red),at=-3:3*2)


should be read as

heatmap.2(x,col=colorRampPalette(c(blue,lightblue,darkgray,darkgray,yellow,red)),breaks=seq(-6.01,6.01,length.out=51))
heatmap.2(x,col=c(blue,lightblue,darkgray,darkgray,yellow,red),breaks=-3:3*2)

at from stats:heatmap became breaks in gplots:heatmap.2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to get Mean rank for Kruskal-Wallis Test

2010-10-11 Thread David Winsemius



On Oct 11, 2010, at 9:43 AM, Lawrence wrote:


Hello All,

I want Ranks' Table in R as like in SPSS ouput in the given link.

http://www.statisticssolutions.com/methods-chapter/statistical-tests/kruskal-wallis-test/

Is the code is already available? Please let me know.


Yes. All code is available:

??Kruskal Wallis
methods(kruskal.test)
getAnywhere(kruskal.test.default)

If you want to extract the table, then looking at the bottom of that  
function, you see that the variables r, g,and x have been created and   
you would need to modify that code and substitute the returned table  
you desire. The table might be a bit more complicated than that  
simplistic offering since ties need to be properly accounted for if  
you intend to replicate the results by hand.




Thanks,
Lawrence


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MATLAB vrs. R

2010-10-11 Thread Craig O'Connell


alain,
 
Perhaps i'm still entering the code wrong.  I tried using your  
result=myquadrature(f,0,2000)
print(result)
 
Instead of my:
val = myquadrature(f,a,b)
result=myquadrature(val,0,2000) 
print(result) 
 
...and I am still getting an inf inf inf inf inf...
 
Did you change any of the previous syntax in addition to changing the result 
statement?
 
Thank you so much and I think my brain is fried!  Happy Holiday.
 
Craig
 
 Date: Mon, 11 Oct 2010 09:59:17 +0200
 From: alain.guil...@uclouvain.be
 To: craigpoconn...@hotmail.com
 CC: pda...@gmail.com; r-help@r-project.org
 Subject: Re: [R] MATLAB vrs. R
 
 Hi,
 
 The first argument of myquadrature in result shouldn't be val but f I 
 guess. At least it works for me
 
  result=myquadrature(f,0,2000)
  print(result)
 [1] 3
 
 Regards,
 Alain
 
 
 On 11-Oct-10 09:37, Craig O'Connell wrote:
  Thank you Peter. That is very much helpful. If you don't mind, I continued 
  running the code to attempt to get my answer and I continue to get inf inf 
  inf... (printed around 100 times).
 
  Any assistance with this issue. Here is my code (including your 
  corrections):
 
 
  myquadrature-function(f,a,b){
  npts=length(f)
  nint=npts-1
  if(npts=1)
  error('need at least two points to integrate')
  end;
  if(b=a)
  error('something wrong with the interval, b should be greater than a')
  else
  dx=b/real(nint)
  end;
  npts=length(f)
  int=0
  int- sum(f[-npts]+f[-1])/2*dx
  }
 
  #Call my quadrature
  x=seq(0,2000,10)
  h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1)
  u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1)
  a = x[1]
  b = x[length(x)]
  plot(x,-h)
  a = x[1];
  b = x[length(x)];
  #call your quadrature function. Hint, the answer should be 3.
  f=u*h;
  val = myquadrature(f,a,b); ? ___This is where issue arises.
  result=myquadrature(val,0,2000) ?
  print(result) ?
 
 
  Thanks again,
 
  Phil
 
 
  
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 -- 
 Alain Guillet
 Statistician and Computer Scientist
 
 SMCS - IMMAQ - Université catholique de Louvain
 Bureau c.316
 Voie du Roman Pays, 20
 B-1348 Louvain-la-Neuve
 Belgium
 
 tel: +32 10 47 30 50
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Nonlinear Regression Parameter Shared Across Multiple Data Sets

2010-10-11 Thread Jared Blashka

I'm working with 3 different data sets and applying this non-linear
regression formula to each of them.

nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers,
start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5))

Previously, all of the regressions were calculated in Prism, but I'd like to
be able to automate the calculation process in a script, which is why I'm
trying to move to R. The issue I'm running into is that previously, in
Prism, I was able to calculate a shared value for a constraint so that all
three data sets shared the same value, but have other constraints calculated
separately. So Prism would figure out what single value for the constraint
in question would work best across all three data sets. For my formula, each
data set needs it's own LOGEC50 value, but the upper value should be the
same across the 3 sets. Is there a way to do this within R, or with a
package I'm not aware of, or will I need to write my own nls function to
work with multiple data sets, because I've got no idea where to start with
that.

Thanks,
Jared

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how can i do anova

2010-10-11 Thread Mauluda Akhtar

 Hi,

I've a table like the following. I want to do ANOVA. Could you please tell
me how can i do it.
I want to show whether the elements (3 for each column) of a column are
significantly different or not.
Just to inform you that i'm a new user of R


 bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044
[1,]  69  46  43  54  54  41
[2,]  68  22  39  31  31  22
[3,]  91  54  57  63  63  50


Thank you.

Moon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hausman test for endogeneity

2010-10-11 Thread Bert Gunter

... and, in fact, simply googling on R Package Hausmann finds two
Hausmann test functions in 2 different packages within the first half
dozen hits.

-- Bert

On Sat, Oct 9, 2010 at 11:06 AM, Liviu Andronic landronim...@gmail.com wrote:
 Hello

 On Sat, Oct 9, 2010 at 2:37 PM, Holger Steinmetz
 holger.steinm...@web.de wrote:
 can anybody point me in the right direction on how to conduct a hausman test
 for endogeneity in simultanous equation models?

 Try
 install.packages('sos')
 require(sos)
 findFn('hausman')

 Here I get these results:
 findFn('hausman')
 found 22 matches;  retrieving 2 pages
 2

 Liviu

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i do anova

2010-10-11 Thread Andrew Miles

Type ?anova on your R command line for the basic function, and links  
to related functions.


Also, try a google search of something like doing anova in R and you  
should find multiple tutorials or examples.


Andrew Miles


On Oct 11, 2010, at 11:33 AM, Mauluda Akhtar wrote:


Hi,

I've a table like the following. I want to do ANOVA. Could you  
please tell

me how can i do it.
I want to show whether the elements (3 for each column) of a column  
are

significantly different or not.
Just to inform you that i'm a new user of R


bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971  
bp_30050044
[1,]  69  46  43  54   
54  41
[2,]  68  22  39  31   
31  22
[3,]  91  54  57  63   
63  50



Thank you.

Moon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help with Cairo

2010-10-11 Thread Ivan Calandra


 Dear users,

As an alternative to RSvgDevice::devSVG, I have tried using Cairo and 
cairoDevice.


When opening the svg file from Cairo::CairoSVG() as well as from 
cairoDevice::Cairo_svg() in Illustrator, I got a warning message (which 
is damn hard to translate since I don't understand it), something like: 
clipping (?) will be lost at reexportation to format 'Tiny'.
I then have a huge black square and some huge black numbers that I can 
remove. But if I do so, the axes labels are gone (I guess these huge 
numbers are the labels...).


After having copied all necessary dll in cairoDevice\libs folder (I 
hope) to make it to load, I don't know what to do.


Any ideas of what the problem could be?

Thanks in advance!
Ivan

 sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32
locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
other attached packages:
[1] cairoDevice_2.14

--
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i do anova

2010-10-11 Thread Joshua Wiley

Hi Moon,

Here is something to get you started.

# Read Data into R
dat - read.table(textConnection(
 bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044
[1,]  69  46  43  54  54  41
[2,]  68  22  39  31  31  22
[3,]  91  54  57  63  63  50),
header = TRUE)
closeAllConnections()

# Load the reshape package
library(reshape)

# Generally it is easier to use data in 'long' format
# that is one column with values, another indicate what those are
dat.long - melt(dat)

# Look at the data
dat.long
# Another useful function that shows the structure of an object
# You should see that 'variable' is a factor with six levels (one for
each column)
# and value is integer class, this is good
str(dat.long)

# Now to fit your model, we can use the aov() function
# The formula specifies the DV on the left and the IVs on the right
# or the outcome and predictors if you think of it that way
# the data = argument tells aov() where to find the data
# in this case in the dat.long variable
model.aov - aov(value ~ variable, data = dat.long)

# Now you can just print it as is
model.aov
# But you may also like the results of, summary()
summary(model.aov)

# If you're thinking about this from a regression point of view
# Fit linear regression model (must like aov())
model.lm - lm(value ~ variable, data = dat.long)

# Look at your model
model.lm
summary(model.lm)

# It may seem very different at first, but now if you use the anova()
# (Mind that it is a slightly different function than aov() )
# You can get the ANOVA source table from the regression model
anova(model.lm)

Hope that helps,

Josh

On Mon, Oct 11, 2010 at 8:33 AM, Mauluda Akhtar maulud...@gmail.com wrote:
  Hi,

 I've a table like the following. I want to do ANOVA. Could you please tell
 me how can i do it.
 I want to show whether the elements (3 for each column) of a column are
 significantly different or not.
 Just to inform you that i'm a new user of R


  bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044
 [1,]          69          46          43          54          54          41
 [2,]          68          22          39          31          31          22
 [3,]          91          54          57          63          63          50


 Thank you.

 Moon

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Boundary correction for kernel density estimation

2010-10-11 Thread Greg Snow

Look at the logspline package.  It uses a different method from what density 
does, but it can take boundaries into account.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Katja Hebestreit
 Sent: Monday, October 11, 2010 3:04 AM
 To: r-help@r-project.org
 Subject: [R] Boundary correction for kernel density estimation
 
 Dear R-users,
 
 I have the following problem:
 
 I would like to estimate the density curve for univariate data between
 0
 and 1. Unfortunately, the density function in the stats package is just
 capable to cut the curve at a left and a right-most point. This
 truncation would lead to an underestimation. An overspill of the
 bounded
 support is unappropriate as well.
 
 Do anyone knows a boundary correction method implmented in R? I did
 much
 research but the correction methods I found are regarding survival or
 spatial data.
 
 Thanks a lot for any hint!
 Cheers,
 Katja
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i do anova

2010-10-11 Thread Liviu Andronic

Hello

On Mon, Oct 11, 2010 at 5:33 PM, Mauluda Akhtar maulud...@gmail.com wrote:
  Hi,

 I've a table like the following. I want to do ANOVA. Could you please tell
 me how can i do it.
 I want to show whether the elements (3 for each column) of a column are
 significantly different or not.
 Just to inform you that i'm a new user of R

Try to do this with either Rcmdr or Deducer, both GUIs to R. Regards
Liviu



  bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044
 [1,]          69          46          43          54          54          41
 [2,]          68          22          39          31          31          22
 [3,]          91          54          57          63          63          50


 Thank you.

 Moon

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multiple comparison correction

2010-10-11 Thread Jake Kami

dear list,

i just found this post in the archive:



On 23-Apr-05 Bill.Venables at csiro.au
https://stat.ethz.ch/mailman/listinfo/r-help wrote:
:* -Original Message-*:* From: r-help-bounces at stat.math.ethz.ch 
https://stat.ethz.ch/mailman/listinfo/r-help *:* [mailto:r-help-bounces at 
stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-help] On Behalf Of 
*:* michael watson (IAH-C)*:* [...]*:* I have a highly significant 
interaction term. In the context*:* of the experiment, this makes sense. I 
can visualise the data *:* graphically, and sure enough I can see that both 
factors have*:* different effects on the data DEPENDING on what the value 
of*:* the other factor is.  *:* *:* I explain this all to my colleague - 
and she asks but which*:* ones are different? This is best illustrated with 
an example.*:* We have either infected | uninfected, and vaccinated | 
unvaccinated*:* (the two factors).*:* We're measuring expression of a gene.  
Graphically, in the*:* infected group, vaccination makes expression go up. In 
the*:* uninfected group, vaccination makes expression go down. In*:* bo!
 th the vaccinated and unvaccinated groups, infection makes*:* expression go 
down, but it goes down further in unvaccinated*:* than it does in 
vaccinated.*:* *:* So from a statistical point of view, I can see exactly 
why*:* the interaction term is significant, but what my colleage*:* wants to 
know is that WITHIN the vaccinated group, does*:* infection decrease 
expression significantly? And within*:* the unvaccinated group, does infection 
decrease expression*:* significantly?  Etc etc etc  Can I get this information 
from*:* the output of the ANOVA, or do I carry out a separate*:* test on e.g. 
just the vaccinated group? (seems a cop out to me)** ** No, you can't get 
this kind of specific information out of the anova** table and yes, anova 
tables *are* a bit of a cop out.  (I sometimes ** think they should only be 
allowed between consenting adults in** private.)*
I think the cop out Michael Watson was referring to means
going back to basics and doing a separate analysis on each group
(though no doubt using the Res SS from the AoV table).

Not that I disagree with your comment: I sometimes think that
anova tables are often passed round between adults in order to
induce consent which might otherwise have been withheld.

* What you are asking for is a non-standard, but perfectly** reasonable 
partition of the degrees of freedom between the** classes of a single factor 
with four levels got by pairing** up the levels of vaccination and 
innoculation. Of course you** can get this information, but you have to do a 
bit of work** for it.  *
It seems to me that this is a wrapper for separate analysis
on each group!

* Before I give the example which I don't expect too many people** to read 
entirely, let me issue a little challenge, namely to** write tools to 
automate a generalized version of the procedure** below.*
[technical setup snipped]

* contrasts(dat$vac_inf) - ginv(m)** gm - aov(y ~ vac_inf, dat)** 
summary(gm)** Df  Sum Sq Mean Sq F value  Pr(F)** vac_inf  
3 12.1294  4.0431   7.348 0.04190** Residuals4  2.2009  0.5502** ** 
This doesn't tell us too much other than there are differences,** probably.  
Now to specify the partition:** ** summary(gm, **   
split = list(vac_inf = list(- vs +|N = 1, **   
- vs +|Y = 2)))** Df  Sum Sq Mean Sq F value  
Pr(F)** vac_inf  3 12.1294  4.0431  7.3480 0.04190**   
vac_inf: - vs +|N  1  7.9928  7.9928 14.5262 0.01892**   vac_inf: - vs +|Y  
1  3.7863  3.7863  6.8813 0.05860** Residuals4  2.2009  0.5502*
Wow, Bill! Dazzling. This is like watching a rabbit hop
into a hat, and fly out as a dove. I must study this syntax.
But where can I find out about the split argument to summary?
I've found the *function* split, whose effect is similar, but
I've wandered around the summary, summary.lm etc. forest
for a while without locating the *argument*.

My naive (cop-out) approach would have been on the lines
of (without setting up the contrast matrix):

  summary(aov(y~vac*inf,data=dat))
  Df  Sum Sq Mean Sq F value  Pr(F)
  vac  1  0.3502  0.3502  0.6364 0.46968
  inf  1 11.3908 11.3908 20.7017 0.01042 *
  vac:inf  1  0.3884  0.3884  0.7058 0.44812
  Residuals4  2.2009  0.5502

so we get the 2.2009 on 4 df SS for redisuals with mean SS 0.5502.

Then I would do:

  mNp-mean(y[(vac==N)(inf==+)])
  mNm-mean(y[(vac==N)(inf==-)])
  mYp-mean(y[(vac==Y)(inf==+)])
  mYm-mean(y[(vac==Y)(inf==-)])

  c( mYp,   mYm,   mNp,  mNm   )
  ##[1]  2.4990492  0.5532018  2.5212655 -0.3058972

  c(mYp-mYm,   mNp-mNm )
  ##[1] 1.945847   2.827163


after which:

  1-pt(((mYp-mYm)/sqrt(0.5502)),4)
  ##[1] 0.02929801

  1-pt(((mNp-mNm)/sqrt(0.5502)),4)
  ##[1] 0.009458266

give you 1-sided t-tests, and

[R] Is there a regression surface demo?

2010-10-11 Thread Joshua Wiley

Hi All,

Does anyone know of a function to plot a regression surface for two
predictors?  RSiteSearch()s and findFn()s have not turned up what I
was looking for.  I was thinking something along the lines of:
http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif

I like the rgl package because showing it from different angles is
nice for demonstrations.  I started to write my own, but it has some
issues (non functioning code start below), and I figured before I
tried to work out the kinks, I would ask for the list's feedback.

Any comments or suggestions (about functions or preferred idioms for
what I tried below, or...) are greatly appreciated.

Josh


RegSurfaceDemo - function(formula, data, xlim, ylim, zlim,
   resolution = 100) {
  require(rgl)
  ## This cannot be the proper way to extract variable names from formula
  vars - rownames(attr(terms(formula), factors))

  ## if no limits set, make them nearest integer to
  ## .75 the lowest value and 1.25 the highest
  ranger - function(x) {
as.integer(range(x) * c(.75, 1.25))
  }
  if(is.null(xlim)) {xlim - ranger(data[, vars[2]])}
  if(is.null(ylim)) {ylim - ranger(data[, vars[3]])}
  if(is.null(zlim)) {zlim - ranger(data[, vars[1]])}

  ## This does not actually work because the data frame
  ## does not get named properly (actually it throws an error)
  ##f - function (x, y) {
  ##  predict(my.model, newdata = data.frame(vars[2] = x, vars[3] = y))
  ##}

  ## Fit model
  my.model - lm(formula = formula, data = data)

  ## Create X, Y, and Z grids
  X - seq(from = xlim[1], to = xlim[2], length.out = resolution)
  Y - seq(from = ylim[1], to = ylim[2], length.out = resolution)
  Z - outer(X, Y, f)

  ## Create 3d scatter plot and add the regression surface
  open3d()
  with(data = data,
   plot3d(x = vars[2], y = vars[3], z = vars[1],
  xlim = xlim, ylim = ylim, zlim = zlim))
  par3d(ignoreExtent = TRUE)
  surface3d(X, Y, Z, col = blue, alpha = .6)
  par3d(ignoreExtent = FALSE)
  return(summary(my.model))
}



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question

2010-10-11 Thread Margaretta 2014


Hello.
I would be very grateful if you could help me in using R.
I need R commands of pseudo random value and qvazi (quazi) random value.
I found commands qnorm and pnorm, but I am not sure that this is the 
same as I am looking for.

Looking forward to hearing from you. Thank you
   Margaret

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] running own function in Java?

2010-10-11 Thread lord12


How do I run my own unique function in eclipse?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/running-own-function-in-Java-tp2990420p2990420.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] LDA fuction

2010-10-11 Thread vascomc


Hello,
I wonder what analysis i have to use to evaluate which environmental
variables most closely related to the grouping that I have.

I has 38 streams are grouped based on eight environmental variables, but I
wonder how these variables relate to these groups.

Example.: PH, dissolved oxygen and altitude over which these variables
relate to a group one, and two . . . 

Understood .

my email is vasc...@gamil.com

Thank you all.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/LDA-fuction-tp2990198p2990198.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] grep triggering error on unicode character

2010-10-11 Thread Dennis Fisher

Colleagues,

[R 2.11; OS X]

I am processing a file on the fly that contains the following text:
XXXáá 
[email clients may display this differently -- the string is three X's followed 
by two instances of the letter a with an acute accent]
I read the file with:
X   - readLines(FILENAME)
In this instance, the text of interest is on line 213.  When I examine line 
213, it reads:
XXX\xe1\xe1
This makes sense because the unicode mapping for á [a-acute] is U+00E1.

The problem arises when I attempt to manipulate the text in the file.  For 
example:
 grep(XXX, X[213])
integer(0)
Warning message:
In grep(XXX, X[213]) : input string 1 is invalid in this locale
Worse, yet:
 tolower(X[213]) 
Error in tolower(X[213]) : invalid multibyte string 1 

I am focussing on resolving the first problem, i.e., identifying a line 
containing XXX.  If I can do so, I can remove the offending lines before I 
execute the tolower command.
However, I am stumped as to how to resolve either problem.

Any help would be appreciated.

Thanks.

Dennis

Dennis Fisher MD
P  (The P Less Than Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] dot plot by group

2010-10-11 Thread casperyc


Hi all,

I have the folloing data table

%%
TypeBATCH   RESPONSE
SHORT   A   22
SHORT   A   3
SHORT   A   16
SHORT   A   14
SHORT   A   8
SHORT   A   27
SHORT   A   11
SHORT   A   17
SHORT   B   12
SHORT   B   17
SHORT   B   11
SHORT   B   10
SHORT   B   16
SHORT   B   18
SHORT   B   15
SHORT   B   13
SHORT   B   9
SHORT   B   20
SHORT   C   4
SHORT   C   16
SHORT   C   32
SHORT   C   11
SHORT   C   9
SHORT   C   25
SHORT   C   27
SHORT   C   12
SHORT   C   26
SHORT   C   7
SHORT   C   14
LONGA   12
LONGA   7
LONGA   19
LONGA   19
LONGA   11
LONGA   33
LONGA   20
LONGA   25
LONGB   24
LONGB   6
LONGB   39
LONGB   14
LONGB   17
LONGB   10
LONGB   22
LONGB   35
LONGB   33
LONGB   21
LONGC   15
LONGC   11
LONGC   17
LONGC   8
LONGC   2
LONGC   10
LONGC   16
LONGC   21
LONGC   9
LONGC   19
LONGC   23

%%

This is read into object 'd'.

I produce the dot plot by,

library(lattice)
dotplot(BATCH~RESPONSE,data=d,groups=Type)

How do I seperately plot them by 'Type'?

I have tried using
dotplot(BATCH~RESPONSE,data=d,groups=Type==SHORT)
dotplot(BATCH~RESPONSE,data=d$Type=='SHORT')

ect

Thanks.

Casper
-- 
View this message in context: 
http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LDA fuction

2010-10-11 Thread Gavin Simpson

On Mon, 2010-10-11 at 10:18 -0700, vascomc wrote:
 Hello,
 I wonder what analysis i have to use to evaluate which environmental
 variables most closely related to the grouping that I have.
 
 I has 38 streams are grouped based on eight environmental variables, but I
 wonder how these variables relate to these groups.
 
 Example.: PH, dissolved oxygen and altitude over which these variables
 relate to a group one, and two . . . 
 
 Understood .
 
 my email is vasc...@gamil.com
 
 Thank you all.

If you clustered the 38 streams on the basis of these eight env
variables it seems a bit perverse to then ask how well these variables
then separate the groups.

For such a small sample, you might be best off computing summary
statistics for the 8 variables conditioned on the groups (mean of Var1
in groups A,B,C,... etc), accompanied by some graphical plotting of the
Variables (e.g box plots of VarX conditioned on group). Such an approach
would presume you aren't trying to predict group membership from the 8
variables.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dot plot by group

2010-10-11 Thread Phil Spector


Casper -
   I think you want

dotplot(BATCH~RESPONSE,data=d,subset=Type=='SHORT')
or
dotplot(BATCH~RESPONSE,data=subset(d,Type=='SHORT'))



- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu

On Mon, 11 Oct 2010, casperyc wrote:



Hi all,

I have the folloing data table

%%
TypeBATCH   RESPONSE
SHORT   A   22
SHORT   A   3
SHORT   A   16
SHORT   A   14
SHORT   A   8
SHORT   A   27
SHORT   A   11
SHORT   A   17
SHORT   B   12
SHORT   B   17
SHORT   B   11
SHORT   B   10
SHORT   B   16
SHORT   B   18
SHORT   B   15
SHORT   B   13
SHORT   B   9
SHORT   B   20
SHORT   C   4
SHORT   C   16
SHORT   C   32
SHORT   C   11
SHORT   C   9
SHORT   C   25
SHORT   C   27
SHORT   C   12
SHORT   C   26
SHORT   C   7
SHORT   C   14
LONGA   12
LONGA   7
LONGA   19
LONGA   19
LONGA   11
LONGA   33
LONGA   20
LONGA   25
LONGB   24
LONGB   6
LONGB   39
LONGB   14
LONGB   17
LONGB   10
LONGB   22
LONGB   35
LONGB   33
LONGB   21
LONGC   15
LONGC   11
LONGC   17
LONGC   8
LONGC   2
LONGC   10
LONGC   16
LONGC   21
LONGC   9
LONGC   19
LONGC   23

%%

This is read into object 'd'.

I produce the dot plot by,

library(lattice)
dotplot(BATCH~RESPONSE,data=d,groups=Type)

How do I seperately plot them by 'Type'?

I have tried using
dotplot(BATCH~RESPONSE,data=d,groups=Type==SHORT)
dotplot(BATCH~RESPONSE,data=d$Type=='SHORT')

ect

Thanks.

Casper
--
View this message in context: 
http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a regression surface demo?

2010-10-11 Thread G. Jay Kerns

Dear Josh,

On Mon, Oct 11, 2010 at 3:15 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi All,

 Does anyone know of a function to plot a regression surface for two
 predictors?  RSiteSearch()s and findFn()s have not turned up what I
 was looking for.  I was thinking something along the lines of:
 http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif

 I like the rgl package because showing it from different angles is
 nice for demonstrations.  I started to write my own, but it has some
 issues (non functioning code start below), and I figured before I
 tried to work out the kinks, I would ask for the list's feedback.

 Any comments or suggestions (about functions or preferred idioms for
 what I tried below, or...) are greatly appreciated.

 Josh


[snip]

I haven't tried to debug your code, but wanted to mention that the
Rcmdr:::scatter3d function does 3-d scatterplots (with the rgl
package) and adds a regression surface, one of 4 or 5 different types.
 If nothing else, it might be a good place to start for making your
own.

A person can play around with the different types in the Rcmdr under
the Graphs menu.  Or, from the command line:

library(Rcmdr)
with(rock, scatter3d(area, peri, shape))

I hope that this helps,
Jay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dot plot by group

2010-10-11 Thread casperyc


Hi Spector,

Yes, that is exactly what I was aiming for.

Thanks.

Casper
-- 
View this message in context: 
http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990495.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dot plot by group

2010-10-11 Thread casperyc


And now I just wonder why the ' bty='n' ' won't work?

I did 

dotplot(BATCH~RESPONSE,data=d,subset=Type=='SHORT',bty='n')

and tried other bty parameters, none is working

Casper 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990500.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep triggering error on unicode character

2010-10-11 Thread Duncan Murdoch


On 11/10/2010 3:36 PM, Dennis Fisher wrote:

Colleagues,

[R 2.11; OS X]

I am processing a file on the fly that contains the following text:
XXXáá
[email clients may display this differently -- the string is three X's followed 
by two instances of the letter a with an acute accent]
I read the file with:
X   - readLines(FILENAME)
In this instance, the text of interest is on line 213.  When I examine line 
213, it reads:
XXX\xe1\xe1
This makes sense because the unicode mapping for á [a-acute] is U+00E1.


That's not what it's saying:  it's saying you have three X's followed by 
two unrecognized characters with hex codes E1.  I imagine the original 
file is encoded using Latin1, because that's how á is encoded there.


The problem arises when I attempt to manipulate the text in the file.  For 
example:
  grep(XXX, X[213])
integer(0)
Warning message:
In grep(XXX, X[213]) : input string 1 is invalid in this locale
Worse, yet:
  tolower(X[213])
Error in tolower(X[213]) : invalid multibyte string 1

I am focussing on resolving the first problem, i.e., identifying a line 
containing XXX.  If I can do so, I can remove the offending lines before I 
execute the tolower command.
However, I am stumped as to how to resolve either problem.

Any help would be appreciated.


You need to declare the encoding of the file when you read it if it's 
not in the default encoding for your locale, or re-encode it.  See 
?readLines.


Duncan Murdoch



Thanks.

Dennis

Dennis Fisher MD
P  (The P Less Than Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i do anova

2010-10-11 Thread Mauluda Akhtar

Dear Andrew Miles,
Thanks a lot.
Moon

On Mon, Oct 11, 2010 at 9:54 PM, Andrew Miles rstuff.mi...@gmail.comwrote:

 Type ?anova on your R command line for the basic function, and links to
 related functions.

 Also, try a google search of something like doing anova in R and you
 should find multiple tutorials or examples.

 Andrew Miles



 On Oct 11, 2010, at 11:33 AM, Mauluda Akhtar wrote:

  Hi,

 I've a table like the following. I want to do ANOVA. Could you please tell
 me how can i do it.
 I want to show whether the elements (3 for each column) of a column are
 significantly different or not.
 Just to inform you that i'm a new user of R


 bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044
 [1,]  69  46  43  54  54
  41
 [2,]  68  22  39  31  31
  22
 [3,]  91  54  57  63  63
  50


 Thank you.

 Moon

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can i do anova

2010-10-11 Thread Mauluda Akhtar

Dear Liviu,
Thanks a lot.
moon
On Tue, Oct 12, 2010 at 1:02 AM, Liviu Andronic landronim...@gmail.comwrote:

 Hello

 On Mon, Oct 11, 2010 at 5:33 PM, Mauluda Akhtar maulud...@gmail.com
 wrote:
   Hi,
 
  I've a table like the following. I want to do ANOVA. Could you please
 tell
  me how can i do it.
  I want to show whether the elements (3 for each column) of a column are
  significantly different or not.
  Just to inform you that i'm a new user of R
 
 Try to do this with either Rcmdr or Deducer, both GUIs to R. Regards
 Liviu


 
   bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044
  [1,]  69  46  43  54  54
  41
  [2,]  68  22  39  31  31
  22
  [3,]  91  54  57  63  63
  50
 
 
  Thank you.
 
  Moon
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mailhttp://garbl.home.comcast.net/%7Egarbl/stylemanual/e.htm#e-mail


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] importing numeric types via sqlQuery

2010-10-11 Thread E C


Hi everyone,
I am using the sqlQuery function (in RODBC library) to import data from a 
database into R. My table (called temp) in the database looks like this:
categorynumabc  54469517.307692307692def36428860.230769230769
I used the following R code to pull data into R:data -sqlQuery(channel, 
select category, num from temp;)
However, the result is that num gets all its decimal places chopped off, so 
data looks like this instead in R:category  numabc  54469517def 
36428860

I've tried various alternative approaches, but none have fixed the problem. 
When I cast the variable to a numeric type like this (data -sqlQuery(channel, 
select category, num::numeric from temp;), it still gave me the same result. 
Casting to a real type like this (data -sqlQuery(channel, select category, 
num::real from temp;) resulted in scientific notation that also rounded the 
numbers.
Any suggestions? Much appreciated!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Spencer 15-point weighted moving average

2010-10-11 Thread Sam Thomas

I am trying to apply Spencer's 15-point weighted moving average filter
to the time series shampoo, using the filter command, but I am not
sure if I am using the filter correctly:

 

library(fma)

sma15 - c(-.009, -.019, -.016, .009, .066, .144, .209, .231, 

.209, .144, .066, .009, -.016, -.019,
-.009)

(s1 - filter(shampoo, sma15))

 

This result does not match the spence.15 command from package locfit

 

library(locfit)

spence.15(shampoo)

 

Any help understanding why these are different (or what I am doing wrong
with filter) would be appreciated.

 

Thanks, 

 

Sam Thomas 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] importing numeric types via sqlQuery

2010-10-11 Thread jim holtman

I would assume that the digitis are not being chopped off.  It is just
that R will typically print data to 7 significant digits:

 x - 54469517.307692307692
 x
[1] 54469517
 options(digits=20)
 x
[1] 54469517.3076923


Your data it there and you can set 'options' to show it if you want
to.  Also with floating point, you will only get about 15 digits of
accuracy (see FAQ 7.31).


On Mon, Oct 11, 2010 at 4:19 PM, E C mmmraspberr...@hotmail.com wrote:

 Hi everyone,
 I am using the sqlQuery function (in RODBC library) to import data from a 
 database into R. My table (called temp) in the database looks like this:
 category        numabc  54469517.307692307692def        36428860.230769230769
 I used the following R code to pull data into R:data -sqlQuery(channel, 
 select category, num from temp;)
 However, the result is that num gets all its decimal places chopped off, so 
 data looks like this instead in R:category      numabc  54469517def     
 36428860

 I've tried various alternative approaches, but none have fixed the problem. 
 When I cast the variable to a numeric type like this (data 
 -sqlQuery(channel, select category, num::numeric from temp;), it still 
 gave me the same result. Casting to a real type like this (data 
 -sqlQuery(channel, select category, num::real from temp;) resulted in 
 scientific notation that also rounded the numbers.
 Any suggestions? Much appreciated!
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a regression surface demo?

2010-10-11 Thread Ista Zahn

There is also wireframe() in lattice and bplot in rms.

-Ista

On Mon, Oct 11, 2010 at 3:49 PM, G. Jay Kerns gke...@ysu.edu wrote:
 Dear Josh,

 On Mon, Oct 11, 2010 at 3:15 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi All,

 Does anyone know of a function to plot a regression surface for two
 predictors?  RSiteSearch()s and findFn()s have not turned up what I
 was looking for.  I was thinking something along the lines of:
 http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif

 I like the rgl package because showing it from different angles is
 nice for demonstrations.  I started to write my own, but it has some
 issues (non functioning code start below), and I figured before I
 tried to work out the kinks, I would ask for the list's feedback.

 Any comments or suggestions (about functions or preferred idioms for
 what I tried below, or...) are greatly appreciated.

 Josh


 [snip]

 I haven't tried to debug your code, but wanted to mention that the
 Rcmdr:::scatter3d function does 3-d scatterplots (with the rgl
 package) and adds a regression surface, one of 4 or 5 different types.
  If nothing else, it might be a good place to start for making your
 own.

 A person can play around with the different types in the Rcmdr under
 the Graphs menu.  Or, from the command line:

 library(Rcmdr)
 with(rock, scatter3d(area, peri, shape))

 I hope that this helps,
 Jay

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dot plot by group

2010-10-11 Thread David Winsemius



On Oct 11, 2010, at 3:55 PM, casperyc wrote:



And now I just wonder why the ' bty='n' ' won't work?


Left open is the answer to the question ... work  ...  how?

Because dotplot is a lattice function?
   ... and bty is a base graphic parameter?

You could try to give par.settings a list that consisted of bty='n'.  
(Failed,  and since it fails you should look at the axis parameters of  
the lattice.options section in the help page.)


You probably need to print out the lattice settings and then work  
toward reconfiguring them for you plot with trellis.par.set:


print(trellis.par.get())

This does work (and suggests I misread the intent of Sarkar in his  
help page, but it remains unclear how you _wanted_ it to work.


dotplot(variety ~ yield | year * site, data=barley,  
trellis.par.set(bty='n' ) )




I did

dotplot(BATCH~RESPONSE,data=d,subset=Type=='SHORT',bty='n')

and tried other bty parameters, none is working

Casper
--
View this message in context: 
http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990500.html
Sent from the R help mailing list archive at Nabble.com.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] topicmodels error

2010-10-11 Thread Dario Solari

I try to fit a LDA model to a TermDocumentMatrix with the topicmodels
package...
but R says:

 Error in LDA(TDM, k = k, method = Gibbs, control = list(seed = SEED,  :
 x is of class “TermDocumentMatrix”“simple_triplet_matrix”

 class(TDM)
 [1] TermDocumentMatrixsimple_triplet_matrix

I try to use a matrix... but don't work:

 MAT - as.matrix(TDM)
 Error in LDA(MAT, k = k, method = Gibbs, control = list(seed = SEED,  :
 x is of class “matrix”

The help say is correct to use a DocumentTermMatrix:
 Arguments
 x  Object of class DocumentTermMatrix

Can anyone help me?
Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] topicmodels error

2010-10-11 Thread David Winsemius


I don't know the answer, but let me point out that:

DocumentTermMatrix %in% class(TDM) should return FALSE since:

DocumentTermMatrix != TermDocumentMatrix

--  
David.


On Oct 11, 2010, at 4:45 PM, Dario Solari wrote:


I try to fit a LDA model to a TermDocumentMatrix with the topicmodels
package...
but R says:

Error in LDA(TDM, k = k, method = Gibbs, control = list(seed =  
SEED,  :

x is of class “TermDocumentMatrix”“simple_triplet_matrix”



class(TDM)
[1] TermDocumentMatrixsimple_triplet_matrix


I try to use a matrix... but don't work:


MAT - as.matrix(TDM)
Error in LDA(MAT, k = k, method = Gibbs, control = list(seed =  
SEED,  :

x is of class “matrix”


The help say is correct to use a DocumentTermMatrix:

Arguments
xObject of class DocumentTermMatrix


Can anyone help me?
Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] importing numeric types via sqlQuery

2010-10-11 Thread E C


Thanks for the quick reply! Hmm, I did not know about the options default. 
However, after I set options, it seems like it's still not displaying 
correctly. I've tried an even simpler example table with only 6 digits (much 
fewer than 20):
categorynum\nabc123.456\ndef456.789\n
Then in R:options(digits = 20)data-sqlQuery(channel, select category, num 
from temp;)But data looks like this:
categorynum\nabc123\ndef456\n
I suspect it's something with sqlQuery that chops off the digits and wondering 
if there's a way of turning it off. Thanks!


 Date: Mon, 11 Oct 2010 16:28:25 -0400
 Subject: Re: [R] importing numeric types via sqlQuery
 From: jholt...@gmail.com
 To: mmmraspberr...@hotmail.com
 CC: r-help@r-project.org
 
 I would assume that the digitis are not being chopped off.  It is just
 that R will typically print data to 7 significant digits:
 
  x - 54469517.307692307692
  x
 [1] 54469517
  options(digits=20)
  x
 [1] 54469517.3076923
 
 
 Your data it there and you can set 'options' to show it if you want
 to.  Also with floating point, you will only get about 15 digits of
 accuracy (see FAQ 7.31).
 
 
 On Mon, Oct 11, 2010 at 4:19 PM, E C mmmraspberr...@hotmail.com wrote:
 
  Hi everyone,
  I am using the sqlQuery function (in RODBC library) to import data from a 
  database into R. My table (called temp) in the database looks like this:
  categorynumabc  54469517.307692307692def
  36428860.230769230769
  I used the following R code to pull data into R:data -sqlQuery(channel, 
  select category, num from temp;)
  However, the result is that num gets all its decimal places chopped off, 
  so data looks like this instead in R:category  numabc  54469517def
   36428860
 
  I've tried various alternative approaches, but none have fixed the problem. 
  When I cast the variable to a numeric type like this (data 
  -sqlQuery(channel, select category, num::numeric from temp;), it still 
  gave me the same result. Casting to a real type like this (data 
  -sqlQuery(channel, select category, num::real from temp;) resulted in 
  scientific notation that also rounded the numbers.
  Any suggestions? Much appreciated!
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] topicmodels error

2010-10-11 Thread Dario Solari

Excuse me...
when i re-read my e-mail i saw my mistake!
I use a TermDocumentMatrix instead of a DocumentTermMatrix...

On 11 Ott, 22:45, Dario Solari dario.sol...@gmail.com wrote:
 I try to fit a LDA model to a TermDocumentMatrix with the topicmodels
 package...
 but R says:

  Error in LDA(TDM, k = k, method = Gibbs, control = list(seed = SEED,  :
  x is of class “TermDocumentMatrix”“simple_triplet_matrix”
  class(TDM)
  [1] TermDocumentMatrix    simple_triplet_matrix

 I try to use a matrix... but don't work:

  MAT - as.matrix(TDM)
  Error in LDA(MAT, k = k, method = Gibbs, control = list(seed = SEED,  :
  x is of class “matrix”

 The help say is correct to use a DocumentTermMatrix:

  Arguments
  x   Object of class DocumentTermMatrix

 Can anyone help me?
 Thanks

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there a regression surface demo?

2010-10-11 Thread Joshua Wiley

Thanks for everyone's responses. Just to follow up, here is a working
version of my original.  The code is not pretty, but it functions.
Assuming you have the 'rgl' package installed and have sourced this
function, here are some examples:

RegSurfaceDemo(mpg ~ vs + wt, data = mtcars)
RegSurfaceDemo(mpg ~ vs * wt, data = mtcars)
RegSurfaceDemo(qsec ~ disp * hp, data = mtcars)

It cannot handle factors, and the axes labels are hideous...I'll get
to that eventually.

Thanks for all your help and suggestions.

Josh

RegSurfaceDemo - function(formula, data, xlim = NULL, ylim = NULL,
   zlim = NULL, resolution = 10) {
 require(rgl)
 ## This cannot be the proper way to extract variable names from formula
 vars - rownames(attr(terms(formula), factors))

 ## if no limits set, make them nearest integer to
 ## .75 the lowest value and 1.25 the highest
 ranger - function(x) {
   as.integer(range(x) * c(.75, 1.25))
 }
 if(is.null(xlim)) {xlim - ranger(data[, vars[2]])}
 if(is.null(ylim)) {ylim - ranger(data[, vars[3]])}
 if(is.null(zlim)) {zlim - ranger(data[, vars[1]])}

 ## This does not actually work because the data frame
 ## does not get named properly (actually it throws an error)
 f - function (x, y) {
   newdat - data.frame(x, y)
   colnames(newdat) - c(vars[2], vars[3])
   predict(my.model, newdata = newdat)
 }

 ## Fit model
 my.model - lm(formula = formula, data = data)

 ## Create X, Y, and Z grids
 X - seq(from = xlim[1], to = xlim[2], length.out = resolution)
 Y - seq(from = ylim[1], to = ylim[2], length.out = resolution)
 Z - outer(X, Y, f)

 ## Create 3d scatter plot and add the regression surface
 open3d()
 with(data = data,
  plot3d(x = get(vars[2]), y = get(vars[3]), z = get(vars[1]),
 xlim = xlim, ylim = ylim, zlim = zlim))
 par3d(ignoreExtent = TRUE)
 surface3d(X, Y, Z, col = blue, alpha = .6)
 par3d(ignoreExtent = FALSE)
 return(summary(my.model))
}



On Mon, Oct 11, 2010 at 1:28 PM, Ista Zahn iz...@psych.rochester.edu wrote:
 There is also wireframe() in lattice and bplot in rms.

 -Ista

 On Mon, Oct 11, 2010 at 3:49 PM, G. Jay Kerns gke...@ysu.edu wrote:
 Dear Josh,

 On Mon, Oct 11, 2010 at 3:15 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi All,

 Does anyone know of a function to plot a regression surface for two
 predictors?  RSiteSearch()s and findFn()s have not turned up what I
 was looking for.  I was thinking something along the lines of:
 http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif

 I like the rgl package because showing it from different angles is
 nice for demonstrations.  I started to write my own, but it has some
 issues (non functioning code start below), and I figured before I
 tried to work out the kinks, I would ask for the list's feedback.

 Any comments or suggestions (about functions or preferred idioms for
 what I tried below, or...) are greatly appreciated.

 Josh


 [snip]

 I haven't tried to debug your code, but wanted to mention that the
 Rcmdr:::scatter3d function does 3-d scatterplots (with the rgl
 package) and adds a regression surface, one of 4 or 5 different types.
  If nothing else, it might be a good place to start for making your
 own.

 A person can play around with the different types in the Rcmdr under
 the Graphs menu.  Or, from the command line:

 library(Rcmdr)
 with(rock, scatter3d(area, peri, shape))

 I hope that this helps,
 Jay

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] support vector machine for right censored data

2010-10-11 Thread SUBIRANA CACHINERO, ISAAC

Hi,

Does anybody know how to fit a support vector machine regression with right 
censored time-to-event response to select the best subset among several 
predictor variables?

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Slow reading multiple tick data files into list of dataframes

2010-10-11 Thread rivercode


Hi,

I am trying to find the best way to read 85 tick data files of format:

 head(nbbo)
1 bid  CON  09:30:00.72209:30:00.722  32.71   98
2 ask  CON  09:30:00.78209:30:00.810  33.14  300
3 ask  CON  09:30:00.80909:30:00.810  33.14  414
4 bid  CON  09:30:00.78309:30:00.810  33.06  200

Each file has between 100,000 to 300,300 rows.

Currently doing   nbbo.list- lapply(filePath, read.csv)to create list
with 85 data.frame objects...but it is taking minutes to read in the data
and afterwards I get the following message on the console when taking
further actions (though it does then stop):

The R Engine is busy. Please wait, and try your command again later.

filePath in the above example is a vector of filenames:
 head(filePath)
[1] C:/work/A/A_2010-10-07_nbbo.csv  
[2] C:/work/AAPL/AAPL_2010-10-07_nbbo.csv
[3] C:/work/ADBE/ADBE_2010-10-07_nbbo.csv
[4] C:/work/ADI/ADI_2010-10-07_nbbo.csv  

Is there a better/quicker or more R way of doing this ?

Thanks,
Chris

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Slow reading multiple tick data files into list of dataframes

2010-10-11 Thread Gabor Grothendieck

On Mon, Oct 11, 2010 at 5:39 PM, rivercode aqua...@gmail.com wrote:

 Hi,

 I am trying to find the best way to read 85 tick data files of format:

 head(nbbo)
 1 bid  CON  09:30:00.722    09:30:00.722  32.71   98
 2 ask  CON  09:30:00.782    09:30:00.810  33.14  300
 3 ask  CON  09:30:00.809    09:30:00.810  33.14  414
 4 bid  CON  09:30:00.783    09:30:00.810  33.06  200

 Each file has between 100,000 to 300,300 rows.

 Currently doing   nbbo.list- lapply(filePath, read.csv)    to create list
 with 85 data.frame objects...but it is taking minutes to read in the data
 and afterwards I get the following message on the console when taking
 further actions (though it does then stop):

    The R Engine is busy. Please wait, and try your command again later.

 filePath in the above example is a vector of filenames:
 head(filePath)
 [1] C:/work/A/A_2010-10-07_nbbo.csv
 [2] C:/work/AAPL/AAPL_2010-10-07_nbbo.csv
 [3] C:/work/ADBE/ADBE_2010-10-07_nbbo.csv
 [4] C:/work/ADI/ADI_2010-10-07_nbbo.csv

 Is there a better/quicker or more R way of doing this ?


You could try (possibly with suitable additonal arguments):

library(sqldf)
lapply(filePath, read.csv.sql)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Slow reading multiple tick data files into list of dataframes

2010-10-11 Thread Mike Marchywka

 Date: Mon, 11 Oct 2010 14:39:54 -0700
 From: aqua...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Slow reading multiple tick data files into list of dataframes
[...]
 Is there a better/quicker or more R way of doing this ?

While there may be an obvious R-related answer, usually it helps if you 
can determine where the bottleneck is in terms of 
resources on your platform- often on older machines you
run out of real memory and then all the time is spent reading
the file onto VM back on disk. Can you tell if you are CPU or
memory limited by using task manager? 

It could in fact be that the best solution involves not trying
to hold your entire data set in memory at once, hard to know without
knowing your platform etc. In the
past, I've found that actually sorting data, a slow process
itself, can speed things up a lot due to less thrashing
of memory hierarchy during the later analysis. I doubt 
if that helps your immediate problem but it does point
to one possible non-obvious optimization depending
on what is slowing you down.

 Thanks,
 Chris

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] compare histograms

2010-10-11 Thread solafah bh

Hello 
How to compare  two statistical histograms? How i can know if these histograms 
are equivalent or not??
 
Regards


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Trouble accessing cov function from stats library

2010-10-11 Thread Steve Taylor

Note that R is case sensitive, so cov and Cov are different.

From: Barth B. Riley bbri...@chestnut.org
To:r-help@r-project.org r-help@r-project.org
Date: 12/Oct/2010 3:31a
Subject: [R] Trouble accessing cov function from stats library
Dear all

I am trying to use the cov function in the stats library. I have no problem 
using this function from the console. However, in my R script I received a 
function not found message. Then I called stats::cov(...) and received an 
error message that the function was not exported. Then I tried stats:::cov 
(three colons) and received the error

Error in get(name, envir = asNamespace(pkg), inherits = FALSE) :
  object 'Cov' not found
I am also importing the ltm library, though I'm not aware of a cov function in 
ltm that could be causing a conflict. Any suggestions?

Thanks

Barth

PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R ( http://www.r/ 
)-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MATLAB vrs. R

2010-10-11 Thread Daniel Nordlund

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Craig O'Connell
 Sent: Monday, October 11, 2010 8:10 AM
 To: alain.guil...@uclouvain.be
 Cc: r-help@r-project.org; pda...@gmail.com
 Subject: Re: [R] MATLAB vrs. R

 alain,

 Perhaps i'm still entering the code wrong.  I tried using your
 result=myquadrature(f,0,2000)
 print(result)

 Instead of my:
 val = myquadrature(f,a,b)
 result=myquadrature(val,0,2000)
 print(result)

 ...and I am still getting an inf inf inf inf inf...

 Did you change any of the previous syntax in addition to changing the
 result statement?

 Thank you so much and I think my brain is fried!  Happy Holiday.

 Craig

Craig,

I haven't seen an answer to this yet, so let me jump in.  You seem to have some 
stuff still leftover from MATLAB.  Here is some cleaned up code that produces 
the result you expect.  I don't think the value of dx was being correctly 
computed in your code.  I did not change the assignment operator you used (=), 
but in R the preferred operator is - (without the quotes). 

myquadrature - function(f,a,b){
  npts = length(f)
  nint = npts-1
  if(npts = 1) error('need at least two points to integrate')
  if(b = a) error('something wrong with the interval, b should be greater than 
a') else dx=b/nint  
  sum(f[-npts]+f[-1])/2*dx
  }

#Call my quadrature
x = seq(0,2000,10)
h = 10*(cos(((2*pi)/2000)*(x-mean(x)))+1)
u = (cos(((2*pi)/2000)*(x-mean(x)))+1)
a = x[1]
b = x[length(x)]
plot(x,-h)
a = x[1];
b = x[length(x)]

#call your quadrature function. Hint, the answer should be 3.
f = u*h
result = myquadrature(f,a,b) 
result

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Revolutions Blog: September Roundup

2010-10-11 Thread David Smith

I write about R every weekday at the Revolutions blog:
 http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month
of particular interest to readers of r-help.

In case you missed them, here are some articles related to R from the
month of September:

http://bit.ly/cuFNat presented a profile of Hadley Wickham, author of
many popular R packages including ggplot2 and reshape.

http://bit.ly/bS71Ld riffed the design of the new Twitter website into
a discussion on calculating the Golden Mean with R. Several readers
contributed 1-liners based on the Fibonacci sequence:
http://bit.ly/dvpemK .

http://bit.ly/bunYJE linked to some elegant code for calculating the
Mandelbrot set in R, and a beautiful animation of the results.

http://bit.ly/ahIZzo linked to a blog post by JD Long on simulating
multivariate random variables using copulas.

http://bit.ly/a8mjZm announced a ggplot2 data visualization competition.

http://bit.ly/cRKEZs linked to a discussion about the merits of dot
charts versus bar charts.

http://bit.ly/cavSLB announced the availability of Revolution R
Enterprise 4.0, available free to academics.

http://bit.ly/aBuFEt posted updated statistics on the growth in R
packages, and asked what other languages can learn from R's package
system.

http://bit.ly/cYujCF noted updates to the plyr and reshape packages,
featuring improved performance and parallel processing.

http://bit.ly/afhkSt noted that R 2.12 is scheduled for release on October 15.

http://bit.ly/bw6ylo announced RevoDeployR, Web Services integration
for R included in Revolution R Enterprise. You can download slides and
a replay of the webinar introducing RevoDeployR here:
http://bit.ly/aRUrPh .

http://bit.ly/afwmwf linked to a feature article about R in Tech
Target: R's time is now.

http://bit.ly/bB2MVC reviewed the state of running R on the iPhone and iPad.

http://bit.ly/bBRHB2 noted that RHIPE creator Saptarshi Guha is
presenting at the Hadoop World conference, and linked to an interview
with him. (There's also a new profile of Saptarshi at:
http://bit.ly/9k7ABg .)

http://bit.ly/aPcxBP linked to a collection of guidelines for
efficient R programming by Martin Morgan.

http://bit.ly/aez046 relayed the Call for Papers for the R/Finance
2011 conference in Chicago.

http://bit.ly/bz8eX8 had guest blogger Joseph Rickert's thoughts on
the relationship between Map-Reduce/Hadoop and R.

http://bit.ly/bYQrt4 linked to some hints for the R beginner by Patrick Burns.

http://bit.ly/cU9BzF linked to Dirk Eddelbuettel's review of the
contributions to R resulting from this year's Google Summer of Code.

There are new R user groups in New Jersey (http://bit.ly/9JnRcg),
Brisbane, QLD (http://bit.ly/cVXHdp) and Toronto
(http://bit.ly/bWhJyw).

Other non-R-related stories in the past month included one about
mono-monostatic bodies (http://bit.ly/cr79bo), and (on a lighter
note), how statisticians and scientists (fail to) communicate
(http://bit.ly/aYgEEa), and funny airline safety videos
(http://bit.ly/bol1ZO).

The R Community Calendar has also been updated at:
http://blog.revolutionanalytics.com/calendar.html

If you're looking for more articles about R, you can find summaries
from previous months at http://blog.revolutionanalytics.com/roundups/.
Join the Revolution mailing list at
http://revolutionanalytics.com/newsletter to be alerted to new
articles on a monthly basis.

As always, thanks for the comments and please keep sending suggestions
to me at da...@revolutionanalytics.com . Don't forget you can also
follow the blog using an RSS reader like Google Reader, or by
following me on Twitter (I'm @revodavid).

Cheers,
# David


--
David M Smith da...@revolutionanalytics.com
VP of Marketing, Revolution Analytics  http://blog.revolutionanalytics.com
Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MATLAB vrs. R

2010-10-11 Thread Daniel Nordlund

I apologize for the noise. I didn't clean up the code enough.  See below.

snip
 
 Craig,
 
 I haven't seen an answer to this yet, so let me jump in.  You seem to have
 some stuff still leftover from MATLAB.  Here is some cleaned up code that
 produces the result you expect.  I don't think the value of dx was being
 correctly computed in your code.  I did not change the assignment operator
 you used (=), but in R the preferred operator is - (without the
 quotes).
 
 myquadrature - function(f,a,b){
   npts = length(f)
   nint = npts-1
   if(npts = 1) error('need at least two points to integrate')
   if(b = a) error('something wrong with the interval, b should be greater
 than a') else dx=b/nint

The 2 'if' statements above should have been

   if(npts = 1) stop('need at least two points to integrate')
   if(b = a) stop('something wrong with the interval, b should be greater than 
a') else dx=b/nint

   sum(f[-npts]+f[-1])/2*dx
   }
 
 #Call my quadrature
 x = seq(0,2000,10)
 h = 10*(cos(((2*pi)/2000)*(x-mean(x)))+1)
 u = (cos(((2*pi)/2000)*(x-mean(x)))+1)
 a = x[1]
 b = x[length(x)]
 plot(x,-h)
 a = x[1];
 b = x[length(x)]
 
 #call your quadrature function. Hint, the answer should be 3.
 f = u*h
 result = myquadrature(f,a,b)
 result
 
 Hope this is helpful,
 
 Dan
 
Daniel Nordlund
Bothell, WA USA
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] running R script on linux server

2010-10-11 Thread Lorenzo Cattarino

Hi R-users,

 

I have a problem running my R code on a Linux cluster. What I did was
write a .pbs file to instruct the cluster on what to do and how:

 

#!/bin/sh

 

#PBS -m ae

#PBS -M uqlca...@uq.edu.au

#PBS -A uq-CSER 

#PBS -N job1_lollo

#PBS -l select=1:ncpus=1:NodeType=fast:mem=8GB

#PBS -l walltime=999:00:00

 

 

cd $PBS_O_WORKDIR

 

source /usr/share/modules/init/bash

 

module load R/2.11.1

 

/home/uqlcatta/script/diag.sh

 

The .pbs file calls a .sh file, which is located on my home directory on
the cluster, and which contains the R script (enclosed in  ) to run

 

#!/bin/bash

 

echo  mat - matrix(1:12,nrow=3,ncol=4)

 

diagonal - diag(mat)

 

write.csv(diagonal, file = diagonal.csv) 

 

  R_tmp

 

echo 'source(R_tmp)' | R --vanilla --slave

rm R_tmp

 

However the cluster sends back to me an error message saying: 

 

Error in write.table(diagonal, file = diagonal.csv, col.names = NA, sep
= ,,  : 

  object 'diagonal.csv' not found

Calls: source ... write.csv - eval.parent - eval - eval -
write.table

Execution halted

 

The write.csv command worked on the R consol on my computer, so I don't
know what is the problem here.

 

Thanks in advance for your help

 

Lorenzo


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (no subject)

2010-10-11 Thread Tim Clark

Dear List,

I am trying to plot date vs. time, but am having problems getting my y-axis 
labels how I want them.  When left on its own R plots time at 6 hour intervals 
from 03:00 to 23:00.  I am wanting 6 hour intervals from 2:00 to 22:00.  I 
realize yaxp doesn't work in plot(), so I am trying to get it to work in 
par().  
However, now I get the ticks where I want them but the time is output as a very 
big number (serial time?).  I have also tried axis() using at= and also get 
seriel time numbers.  Any suggestions on how to format time on an axis?

  mydat$Date-as.POSIXct(as.character(mydat$Date), format=c(%m/%d/%Y))
  mydat$Time-as.POSIXct(as.character(mydat$Time), format=c(%H:%M:%S))

  plot(mydat$DateTime,mydat$Time, xlab=c(Date), ylab=c(Time),
  xlim=c(min(mydat$DateTime),max(mydat$DateTime)),
  ylim=c(min(mydat$Time),max(mydat$Time)),
  yaxt=n,
  yaxs=i,
  pch=19, cex=.4,
  type=n)

  par(yaxp=c(as.POSIXct(as.character(02:00), 
format=c(%H:%M)),as.POSIXct(as.character(22:00), format=c(%H:%M)),5))
  axis(2)


Thanks,

Tim
 Tim Clark

Marine Ecologist
National Park of American Samoa




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] expression() problem !

2010-10-11 Thread michel.mas


Hello everyone ... I have a problem when I try to mix expressions using the
function expression () with variables coming from my code. Has anyone faced
such a problem?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/expression-problem-tp2990891p2990891.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] expression() problem !

2010-10-11 Thread David Winsemius



On Oct 11, 2010, at 7:27 PM, michel.mas wrote:



Hello everyone ... I have a problem when I try to mix expressions  
using the
function expression () with variables coming from my code. Has  
anyone faced

such a problem?


Many times:

?bquote   # instead of expression



--
View this message in context: 
http://r.789695.n4.nabble.com/expression-problem-tp2990891p2990891.html
Sent from the R help mailing list archive at Nabble.com.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2010-10-11 Thread David Winsemius

Two  things I see. First is that par needs to be called _before_ the  
plot (although its effects will persist if you need to keep hacking  
away)  and the second is that yaxp is expecting numeric arguments  
(which you are offering)  but in your case these will need to be the  
numeric values in a Date-aligned version of those DateTimes used in  
your mydat arguments. POSIXct is the number of seconds since the origin.


--
David.

On Oct 11, 2010, at 6:56 PM, Tim Clark wrote:


Dear List,

I am trying to plot date vs. time, but am having problems getting my  
y-axis
labels how I want them.  When left on its own R plots time at 6 hour  
intervals
from 03:00 to 23:00.  I am wanting 6 hour intervals from 2:00 to  
22:00.  I
realize yaxp doesn't work in plot(), so I am trying to get it to  
work in par().
However, now I get the ticks where I want them but the time is  
output as a very
big number (serial time?).  I have also tried axis() using at= and  
also get
seriel time numbers.  Any suggestions on how to format time on an  
axis?


  mydat$Date-as.POSIXct(as.character(mydat$Date), format=c(%m/%d/ 
%Y))
  mydat$Time-as.POSIXct(as.character(mydat$Time), format=c(%H:%M: 
%S))


  plot(mydat$DateTime,mydat$Time, xlab=c(Date), ylab=c(Time),
  xlim=c(min(mydat$DateTime),max(mydat$DateTime)),
  ylim=c(min(mydat$Time),max(mydat$Time)),
  yaxt=n,
  yaxs=i,
  pch=19, cex=.4,
  type=n)

  par(yaxp=c(as.POSIXct(as.character(02:00),
format=c(%H:%M)),as.POSIXct(as.character(22:00), format=c(%H: 
%M)),5))

  axis(2)


Thanks,

Tim
 Tim Clark

Marine Ecologist
National Park of American Samoa




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2010-10-11 Thread jim holtman

use 'axis' with 'at=' and 'labels=' to put your own labels on the axis.

Have to guess at your data since you did not provide a reproducible example:

x - seq(as.POSIXct('2010-10-11 00:00'), as.POSIXct('2010-10-12
00:00'), length = 20)
plot(x, x, type = 'o', yaxt = 'n')
axis.POSIXct(2, at = seq(as.POSIXct('2010-10-11 02:00'),
as.POSIXct('2010-10-12 00:00'),
by = '6 hours'))


On Mon, Oct 11, 2010 at 6:56 PM, Tim Clark mudiver1...@yahoo.com wrote:
 Dear List,

 I am trying to plot date vs. time, but am having problems getting my y-axis
 labels how I want them.  When left on its own R plots time at 6 hour intervals
 from 03:00 to 23:00.  I am wanting 6 hour intervals from 2:00 to 22:00.  I
 realize yaxp doesn't work in plot(), so I am trying to get it to work in 
 par().
 However, now I get the ticks where I want them but the time is output as a 
 very
 big number (serial time?).  I have also tried axis() using at= and also get
 seriel time numbers.  Any suggestions on how to format time on an axis?

   mydat$Date-as.POSIXct(as.character(mydat$Date), format=c(%m/%d/%Y))
   mydat$Time-as.POSIXct(as.character(mydat$Time), format=c(%H:%M:%S))

   plot(mydat$DateTime,mydat$Time, xlab=c(Date), ylab=c(Time),
   xlim=c(min(mydat$DateTime),max(mydat$DateTime)),
   ylim=c(min(mydat$Time),max(mydat$Time)),
   yaxt=n,
   yaxs=i,
   pch=19, cex=.4,
   type=n)

   par(yaxp=c(as.POSIXct(as.character(02:00),
 format=c(%H:%M)),as.POSIXct(as.character(22:00), format=c(%H:%M)),5))
   axis(2)


 Thanks,

 Tim
  Tim Clark

 Marine Ecologist
 National Park of American Samoa




 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Slow reading multiple tick data files into list of dataframes

2010-10-11 Thread jim holtman

For 100,000 rows, it took about 2 seconds to read it in on my system:

 system.time(x - read.table('/recv/test.txt', as.is=TRUE))
   user  system elapsed
   1.920.082.08
 str(x)
'data.frame':   196588 obs. of  7 variables:
 $ V1: int  1 2 3 4 1 2 3 1 2 3 ...
 $ V2: chr  bid ask ask bid ...
 $ V3: chr  CON CON CON CON ...
 $ V4: chr  09:30:00.722 09:30:00.782 09:30:00.809 09:30:00.783 ...
 $ V5: chr  09:30:00.722 09:30:00.810 09:30:00.810 09:30:00.810 ...
 $ V6: num  32.7 33.1 33.1 33.1 32.7 ...
 $ V7: int  98 300 414 200 98 300 414 98 300 414 ...
 object.size(x)
6291928 bytes


Given that you have about 85 files, I would guess that you would need
about 800MB if all were 300K lines longs.  You might be getting memory
fragmentation.  You might try using gc() every so often in the loop.
What are you going to do with the data?  Are you going to make one big
file?  In this case you might want a 64 bit version since you will
have a single instance of 800K and will probably need 2-3X that much
memory if copies are being made during processing.  Object might be
larger in 64-bit.

Maybe you need to follow Gabor's advice and read it into a database
and then process it from there.

On Mon, Oct 11, 2010 at 5:48 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Mon, Oct 11, 2010 at 5:39 PM, rivercode aqua...@gmail.com wrote:

 Hi,

 I am trying to find the best way to read 85 tick data files of format:

 head(nbbo)
 1 bid  CON  09:30:00.722    09:30:00.722  32.71   98
 2 ask  CON  09:30:00.782    09:30:00.810  33.14  300
 3 ask  CON  09:30:00.809    09:30:00.810  33.14  414
 4 bid  CON  09:30:00.783    09:30:00.810  33.06  200

 Each file has between 100,000 to 300,300 rows.

 Currently doing   nbbo.list- lapply(filePath, read.csv)    to create list
 with 85 data.frame objects...but it is taking minutes to read in the data
 and afterwards I get the following message on the console when taking
 further actions (though it does then stop):

    The R Engine is busy. Please wait, and try your command again later.

 filePath in the above example is a vector of filenames:
 head(filePath)
 [1] C:/work/A/A_2010-10-07_nbbo.csv
 [2] C:/work/AAPL/AAPL_2010-10-07_nbbo.csv
 [3] C:/work/ADBE/ADBE_2010-10-07_nbbo.csv
 [4] C:/work/ADI/ADI_2010-10-07_nbbo.csv

 Is there a better/quicker or more R way of doing this ?


 You could try (possibly with suitable additonal arguments):

 library(sqldf)
 lapply(filePath, read.csv.sql)

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function using values separated by a comma

2010-10-11 Thread burgundy


Hi

Just used this function on my real data - several enormous files (80 rows 
by 200 columns...) and it worked perfectly! Thanks again for your help, saved 
me a lot of time!

A last quick query, I have several other similar problems to deal with in my 
data - do you know a useful book or online course that would be helpful for 
learning these sorts of data handling functions?

Thanks again!


--- On Fri, 8/10/10, Jeffrey Spies-2 [via R] 
ml-node+2968583-620301009-75...@n4.nabble.com wrote:

From: Jeffrey Spies-2 [via R] ml-node+2968583-620301009-75...@n4.nabble.com
Subject: Re: function using values separated by a comma
To: burgundy saub...@yahoo.com
Date: Friday, 8 October, 2010, 16:48



Here's another method without using any external regular expression libraries:


dat - read.table(tc - textConnection(

'0,1 1,3 40,10 0,0

20,5 4,2 10,40 10,0

0,11 1,2 120,10 0,0'), sep=)


mat - apply(dat, c(1,2), function(x){

Â  Â  Â  Â  temp - as.numeric(unlist(strsplit(x, ',')))

Â  Â  Â  Â  min(temp)/sum(temp)

})


For mat[2,4], I get 0 (as did the other solutions), and you get 1, so

check on that. If you want the divide-by-0 NaNs to be 0, you can check

that by replacing


min(temp)/sum(temp)


with:


ifelse(is.nan(val-min(temp)/sum(temp)), 0, val)


This has an advantage over:


mat[is.na(mat)] - 0


in that you might have true missingness in your data and is.na won't

be able to distinguish it.


Cheers,


Jeff.


On Fri, Oct 8, 2010 at 1:19 AM, burgundy [hidden email] wrote:



 Hello,



 I have a dataframe (tab separated file) which looks like the example below -

 two values separated by a comma, and tab separation between each of these.



 Â  Â  [,1] Â [,2] Â [,3] Â [ ,4]

 [1,] 0,1 Â 1,3 Â  40,10 Â 0,0

 [2,] 20,5 Â 4,2 Â 10,40 Â 10,0

 [3,] 0,11 Â 1,2 Â 120,10 Â 0,0



 I would like to calculate the percentage of the smallest number separated by

 the comma by:

 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50

 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50

 = 0.8

 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave

 value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2



 plan to generate file like:



 Â  Â [,1] Â [,2] Â [,3] Â [,4]

 [1,] 1 Â  0.25 Â 0.2 Â 0

 [2,] 0.2 Â 0.33 Â 0.2 Â 1

 [3,] 1 Â 0.33 Â 0.08 Â 0



 Apologies, I know this is very complex. Any help, even just some pointers on

 how to write a general function where values are separated by a comma, is

 realy very much appreciated!



 Thank you



 --

 View this message in context: 
 http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html
 Sent from the R help mailing list archive at Nabble.com.



 __

 [hidden email] mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







View message @ 
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2968583.html


To unsubscribe from function using values separated by a comma, click here.







-- 
View this message in context: 
http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2990966.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] importing numeric types via sqlQuery

2010-10-11 Thread jim holtman

Must be your datrabase interface:

 require(sqldf)
 # don't have MySql, but will use sqlite as example
 myData - data.frame(cat = c('abc', 'def'), num=c(123.456, 7890.1234))
 myData
  cat  num
1 abc  123.456
2 def 7890.123
 sqldf('select cat, num from myData')  # now make sql request
  cat  num
1 abc  123.456
2 def 7890.123


# works fine here

On Mon, Oct 11, 2010 at 4:51 PM, E C mmmraspberr...@hotmail.com wrote:
 Thanks for the quick reply! Hmm, I did not know about the options default.
 However, after I set options, it seems like it's still not displaying
 correctly. I've tried an even simpler example table with only 6 digits (much
 fewer than 20):
 category num\n
 abc 123.456\n
 def 456.789\n
 Then in R:
 options(digits = 20)
 data-sqlQuery(channel, select category, num from temp;)
 But data looks like this:
 category num\n
 abc 123\n
 def 456\n
 I suspect it's something with sqlQuery that chops off the digits and
 wondering if there's a way of turning it off. Thanks!


 Date: Mon, 11 Oct 2010 16:28:25 -0400
 Subject: Re: [R] importing numeric types via sqlQuery
 From: jholt...@gmail.com
 To: mmmraspberr...@hotmail.com
 CC: r-help@r-project.org

 I would assume that the digitis are not being chopped off. It is just
 that R will typically print data to 7 significant digits:

  x - 54469517.307692307692
  x
 [1] 54469517
  options(digits=20)
  x
 [1] 54469517.3076923
 

 Your data it there and you can set 'options' to show it if you want
 to. Also with floating point, you will only get about 15 digits of
 accuracy (see FAQ 7.31).


 On Mon, Oct 11, 2010 at 4:19 PM, E C mmmraspberr...@hotmail.com wrote:
 
  Hi everyone,
  I am using the sqlQuery function (in RODBC library) to import data from
  a database into R. My table (called temp) in the database looks like this:
  category        numabc  54469517.307692307692def
   36428860.230769230769
  I used the following R code to pull data into R:data -sqlQuery(channel,
  select category, num from temp;)
  However, the result is that num gets all its decimal places chopped
  off, so data looks like this instead in R:category      numabc
   54469517def     36428860
 
  I've tried various alternative approaches, but none have fixed the
  problem. When I cast the variable to a numeric type like this (data
  -sqlQuery(channel, select category, num::numeric from temp;), it still
  gave me the same result. Casting to a real type like this (data
  -sqlQuery(channel, select category, num::real from temp;) resulted in
  scientific notation that also rounded the numbers.
  Any suggestions? Much appreciated!
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Time OffSet From GMT - Losing it

2010-10-11 Thread rivercode


That is embarrassingthanks for pointing out my mistake.

Chris
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Time-OffSet-From-GMT-Losing-it-tp2968940p2990987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MATLAB vrs. R

2010-10-11 Thread Craig O'Connell


Daniel,
 
That's it!  Thanks.  Your help is very much appreciated.  I'm hoping to 
nail down the code conversion from MATLAB to R, but it seems to be a bit more 
difficult that I had anticipated.
 
Craig
 
 From: djnordl...@frontier.com
 To: djnordl...@frontier.com; r-help@r-project.org
 Date: Mon, 11 Oct 2010 16:12:48 -0700
 Subject: Re: [R] MATLAB vrs. R
 
 I apologize for the noise. I didn't clean up the code enough. See below.
 
 snip
  
  Craig,
  
  I haven't seen an answer to this yet, so let me jump in. You seem to have
  some stuff still leftover from MATLAB. Here is some cleaned up code that
  produces the result you expect. I don't think the value of dx was being
  correctly computed in your code. I did not change the assignment operator
  you used (=), but in R the preferred operator is - (without the
  quotes).
  
  myquadrature - function(f,a,b){
  npts = length(f)
  nint = npts-1
  if(npts = 1) error('need at least two points to integrate')
  if(b = a) error('something wrong with the interval, b should be greater
  than a') else dx=b/nint
 
 The 2 'if' statements above should have been
 
 if(npts = 1) stop('need at least two points to integrate')
 if(b = a) stop('something wrong with the interval, b should be greater than 
 a') else dx=b/nint
 
  sum(f[-npts]+f[-1])/2*dx
  }
  
  #Call my quadrature
  x = seq(0,2000,10)
  h = 10*(cos(((2*pi)/2000)*(x-mean(x)))+1)
  u = (cos(((2*pi)/2000)*(x-mean(x)))+1)
  a = x[1]
  b = x[length(x)]
  plot(x,-h)
  a = x[1];
  b = x[length(x)]
  
  #call your quadrature function. Hint, the answer should be 3.
  f = u*h
  result = myquadrature(f,a,b)
  result
  
  Hope this is helpful,
  
  Dan
  
 Daniel Nordlund
 Bothell, WA USA
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Create DataSet with MCAR type

2010-10-11 Thread Jumlong Vongprasert

Dear all
I want to create dataset with MCAR type from my dataset.
I have my dataset with 100 records, and I want to create dataset from this
dataset to missing 5 records.
How I can do it.
THX
Jumlong

-- 
Jumlong Vongprasert
Institute of Research and Development
Ubon Ratchathani Rajabhat University
Ubon Ratchathani
THAILAND
34000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Comparison of two files with multiple arguments

2010-10-11 Thread burgundy


Hello,

I have an example file which can be generated using:

dat - read.table(tc - textConnection(
'T T,G G T
C NA G G
A,T A A NA'), sep=) 

I also have a reference file with the same number of rows, for example:
G
C
A

I would like to transform the file to numerical values using the following
arguments:
1) Where data points have two letters separated by a comma, e.g. T,G,
replace with a 2
2) Where single letter data points match the data point in the corresponding
row of the reference file, replace with a 0
3) Where single letter data points do not match the reference file, replace
with a 1
4) NA is left as NA

In the example, the output file would look like:

1 2 0 1
0 NA 1 1
2 0 0 NA

Any advice very much appreciated. Also, if you know of any good books or
online courses that can help me to learn how to deal with these sorts of
data handling queries, that is also great!

Thank you

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Comparison-of-two-files-with-multiple-arguments-tp2991043p2991043.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create DataSet with MCAR type

2010-10-11 Thread Michael Bedward

Hello Jumlong,

I'm not sure whether by '100 records' you mean a vector of 100 values
or a matrix / data.frame of 100 rows.

For a vector or matrix X you can do this:

X[ sample( length(X), 5 ) ] - NA

For a data.frame X you could do this:

X[ sample( nrow(X), 5 ), sample( ncol(X), 5) ] - NA

Hope this helps,

Michael


On 12 October 2010 13:14, Jumlong Vongprasert jumlong.u...@gmail.com wrote:
 Dear all
 I want to create dataset with MCAR type from my dataset.
 I have my dataset with 100 records, and I want to create dataset from this
 dataset to missing 5 records.
 How I can do it.
 THX
 Jumlong

 --
 Jumlong Vongprasert
 Institute of Research and Development
 Ubon Ratchathani Rajabhat University
 Ubon Ratchathani
 THAILAND
 34000

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparison of two files with multiple arguments

2010-10-11 Thread Michael Bedward

Hello,

Here's one way to do it. It assumes dat has character values, not factors.

dat2 - matrix(0, nrow(dat), ncol(dat))
dat2[ is.na(dat) ] - NA
dat2[ apply(dat, 2, function(x) grepl(,, x)) ] - 2
dat2[ apply(dat, 2, function(x) x != ref) ] - 1

Michael


On 12 October 2010 13:24, burgundy saub...@yahoo.com wrote:

 Hello,

 I have an example file which can be generated using:

 dat - read.table(tc - textConnection(
 'T T,G G T
 C NA G G
 A,T A A NA'), sep=)

 I also have a reference file with the same number of rows, for example:
 G
 C
 A

 I would like to transform the file to numerical values using the following
 arguments:
 1) Where data points have two letters separated by a comma, e.g. T,G,
 replace with a 2
 2) Where single letter data points match the data point in the corresponding
 row of the reference file, replace with a 0
 3) Where single letter data points do not match the reference file, replace
 with a 1
 4) NA is left as NA

 In the example, the output file would look like:

 1 2 0 1
 0 NA 1 1
 2 0 0 NA

 Any advice very much appreciated. Also, if you know of any good books or
 online courses that can help me to learn how to deal with these sorts of
 data handling queries, that is also great!

 Thank you

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Comparison-of-two-files-with-multiple-arguments-tp2991043p2991043.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with function writing

2010-10-11 Thread Tim Elwell-Sutton

Hello all
I have what seems like a simple question but have not been able to find an
answer on the forum. I'm trying to define a function which involves
regression models and a large number of covariates. 

I would like the function to accept any number of covariates and, ideally, I
would like to be able to enter the covariates in a group (e.g. as a list)
rather than individually. Is there any way of doing this? 

Example:

#define function involving regression model with several covariates 
custom - function(outcome, exposure, covar1, covar2, covar3){
  model - lm(outcome ~ exposure + covar1 +  covar2 + covar3)
  expected - predict(model)
  summary(expected)
}

library(MASS)
attach(birthwt)

custom(bwt, lwt, low, age, race) #Works when 3 covariates are specified

custom(bwt,lwt,low,age) # Does not work with  or  3 covariates

varlist - list(low,age,race)
custom(bwt,lwt, varlist) #Does not work if covariates are included as a list



Thanks very much for your help

Tim

--
Tim Elwell-Sutton
University of Hong Kong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running R on a server

2010-10-11 Thread jthetzel


Sachin,

I apologize if I'm over-simplifying your question.  I mostly run R on an
Ubuntu server via a Windows laptop.  I log in to the remote server via SSH
(via PuTTY on Windows), and then open an interactive R session through the
usual ways (typing 'R' at the Linux command line).  When creating figures,
I'll usually just output the figures to pdfs, via pdf().  However, if I need
a more interactive experience with the figures, I'll ask PuTTY to initiate
'X11 forwarding', which, on Windows, also requires an X server, such as
Xming.  This causes plots to appear in new windows just like you were
running R on your local machine.

If you are interested in running non-interactive batch R scripts, reference
the following:
http://stat.ethz.ch/R-manual/R-devel/library/utils/html/BATCH.html .

Another note on running R over SSH: if you lose your SSH connection, the R
process will stop.  I get around this by using the 'screen' command in Linux
(http://en.wikipedia.org/wiki/GNU_Screen).  See man screen for details. 
While 'screen' does many things, relevant to this thread it creates new
remote terminals that persist after SSH disconnects.  After SSHing into my
server, I type 'screen' and [return], then 'R'.  R starts up and I start the
analysis.  I can manually 'detach' the screen by hitting the 'control' and
'a' keys together, and then hitting the 'd' key.  The R process (and any
other processes started in that screen session) will continue to run.  One
can start many screens.  Typing 'screen -ls' shows the currently running
screens.  If only one screen is running, typing 'screen -r' will attach that
screen, and one can continue on one's analysis in R.  If multiple screen
sessions are open, one will need to specify the screen name after the
'screen -r' command.  Sometimes after an abrupt disconnect, the screen will
remain attached, even though the SSH connection is lost.  To get back to the
screen session, one must first 'detach' and then 're-attach' the screen by
typing 'screen -dr'.

Let me know if you have more specific questions.

Cheers,
Jeremy

Jeremy Hetzel
Boston University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Running-R-on-a-server-tp2967748p2991084.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting data subset for plot

2010-10-11 Thread elaine kuo

Dear list,



I want to make a plot based on the following information, using the command
plot.

variable A for x axis : temperature (range: -20 degrees to 40 degree)

variable B for y axis : altitude (range: 50 m to 2500 m )



The data below 0 degree of X variable wants to be erased tentatively.

Please kindly advise the command to extract the data ranging from 0 degree
to 40 degrees.

Thank you.



Elaine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with function writing

2010-10-11 Thread Michael Bedward

Hello Tim,

This function will do it where the covariates are provided as separate
arguments. It would be easy to modify this to handle a list too.

function(outcome, ...) {
  arg.names - as.character(match.call())[-1]
  nargs - length(arg.names)
  f - as.formula(paste(arg.names[1], ~, paste(arg.names[2:nargs],
collapse=+)))

  model - lm(f)

  # rest of your code here
}

Hope that helps you get started.

Michael


On 12 October 2010 14:35, Tim Elwell-Sutton tesut...@hku.hk wrote:
 Hello all
 I have what seems like a simple question but have not been able to find an
 answer on the forum. I'm trying to define a function which involves
 regression models and a large number of covariates.

 I would like the function to accept any number of covariates and, ideally, I
 would like to be able to enter the covariates in a group (e.g. as a list)
 rather than individually. Is there any way of doing this?

 Example:

 #define function involving regression model with several covariates
 custom - function(outcome, exposure, covar1, covar2, covar3){
  model - lm(outcome ~ exposure + covar1 +  covar2 + covar3)
  expected - predict(model)
  summary(expected)
 }

 library(MASS)
 attach(birthwt)

 custom(bwt, lwt, low, age, race) #Works when 3 covariates are specified

 custom(bwt,lwt,low,age) # Does not work with  or  3 covariates

 varlist - list(low,age,race)
 custom(bwt,lwt, varlist) #Does not work if covariates are included as a list



 Thanks very much for your help

 Tim

 --
 Tim Elwell-Sutton
 University of Hong Kong

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 108 matches

Mail list logo