Re: [R] MATLAB vrs. R
On 10/11/2010 07:46 AM, Craig O'Connell wrote: I need to find the area under a trapezoid for a research-related project. I was able to find the area under the trapezoid in MATLAB using the code: function [int] = myquadrature(f,a,b) % user-defined quadrature function % integrate data f from x=a to x=b assuming f is equally spaced over the interval % use type % determine number of data points npts = prod(size(f)); nint = npts -1; %number of intervals if(npts =1) error('need at least two points to integrate') end; % set the grid spacing if(b =a) error('something wrong with the interval, b should be greater than a') else dx = b/real(nint); end; npts = prod(size(f)); % trapezoidal rule % can code in line, hint: sum of f is sum(f) % last value of f is f(end), first value is f(1) % code below int=0; for i=1:(nint) %F(i)=dx*((f(i)+f(i+1))/2); int=int+dx*((f(i)+f(i+1))/2); end %int=sum(F); Then to call myquadrature I did: % example function call test the user-defined myquadrature function % setup some data % velocity profile across a channel % remember to use ? for help, e.g. ?seq x = 0:10:2000; % you can access one element of a list of values using brackets % x(1) is the first x value, x(2), the 2nd, etc. % if you want the last value, a trick is x(end) % the function cos is cosin and mean gives the mean value % pi is 3.1415, or pi % another hint, if you want to multiple two series of numbers together % for example c = a*b where c(1) = a(1)*b(1), c(2) = a(2)*b(2), etc. % you must tell Matlab you want element by element multiplication % e.g.:c = a.*b % note the . % h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1); %bathymetry u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1); %vertically-averaged cross-transect velocity plot(x,-h) % set begin and end points for the integration a = x(1); b = x(end); % call your quadrature function. Hint, the answer should be 3. f=u.*h; val = myquadrature(f,a,b); fprintf('the solution is %f\n',val); This is great, I got the expected answer of 3. NOW THE ISSUE IS, I HAVE NO IDEA HOW THIS CODE TRANSLATES TO R. Here is what I attempted to do, and with error messages, I can tell i'm doing something wrong: myquadrature-function(f,a,b){ npts=length(f) nint=npts-1 if(npts=1) error('need at least two points to integrate') end; if(b=a) error('something wrong with the interval, b should be greater than a') else dx=b/real(nint) end; npts=length(f) _(below this line, I cannot code) int=0 for(i in 1:(npts-1)) sum(f)=((b-a)/(2*length(f)))*(0.5*f[i]+f[i+1]+f[length(f)])} %F(i)=dx*((f(i)+f(i+1))/2); int=int+dx*((f(i)+f(i+1))/2); end %int=sum(F); For a literal translation, just pay a little more attention to detail: for(i in 1:(npts-1)) int - int+dx*(f[1]+f[i+1])/2 However, a more R-ish way is to drop the loop and vectorize: int - sum(f[-npts]+f[-1])/2*dx (or int - sum(f) - (f[1]+f[npts])/2, by a well-known rewrite of the trapezoidal rule). Thank you and any potential suggestions would be greatly appreciated. Dr. Argese. [[alternative HTML version deleted]] -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MATLAB vrs. R
Thank you Peter. That is very much helpful. If you don't mind, I continued running the code to attempt to get my answer and I continue to get inf inf inf... (printed around 100 times). Any assistance with this issue. Here is my code (including your corrections): myquadrature-function(f,a,b){ npts=length(f) nint=npts-1 if(npts=1) error('need at least two points to integrate') end; if(b=a) error('something wrong with the interval, b should be greater than a') else dx=b/real(nint) end; npts=length(f) int=0 int - sum(f[-npts]+f[-1])/2*dx } #Call my quadrature x=seq(0,2000,10) h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1) u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1) a = x[1] b = x[length(x)] plot(x,-h) a = x[1]; b = x[length(x)]; #call your quadrature function. Hint, the answer should be 3. f=u*h; val = myquadrature(f,a,b); ? ___This is where issue arises. result=myquadrature(val,0,2000) ? print(result) ? Thanks again, Phil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MATLAB vrs. R
Hi, The first argument of myquadrature in result shouldn't be val but f I guess. At least it works for me result=myquadrature(f,0,2000) print(result) [1] 3 Regards, Alain On 11-Oct-10 09:37, Craig O'Connell wrote: Thank you Peter. That is very much helpful. If you don't mind, I continued running the code to attempt to get my answer and I continue to get inf inf inf... (printed around 100 times). Any assistance with this issue. Here is my code (including your corrections): myquadrature-function(f,a,b){ npts=length(f) nint=npts-1 if(npts=1) error('need at least two points to integrate') end; if(b=a) error('something wrong with the interval, b should be greater than a') else dx=b/real(nint) end; npts=length(f) int=0 int- sum(f[-npts]+f[-1])/2*dx } #Call my quadrature x=seq(0,2000,10) h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1) u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1) a = x[1] b = x[length(x)] plot(x,-h) a = x[1]; b = x[length(x)]; #call your quadrature function. Hint, the answer should be 3. f=u*h; val = myquadrature(f,a,b); ? ___This is where issue arises. result=myquadrature(val,0,2000) ? print(result) ? Thanks again, Phil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mvtnorm and noncentrality parameters
Hi I'm trying to calculate densities for the multivariate noncentral t distribution. For the avoidance of doubt, and becuase there do seem to be at least two definitions for a noncentral t distribution, This is a noncentral t distribution of the sort described in McNeill Frey and Embrechts (2005) and elsewhere that can be constructed as a normal mean-variance mixture. It is not simply a shifted t distribution. The univariate noncentral t distribution density can be calculated using the function dt in the TDist package. The univariate version seems to be do what I expect, i.e. give a distribution with a different shape. The obvious approach for calculating a multivarate result seems to be to use dmvt in the mvtnorm package. However, the noncentrality parameters here appear actually to be means. In other words, they seem to simply shift the distribution rather than calculating a noncentral mutivariate distribution as described by McNeill et al. So, a few questions: 1. Is my understanding of what dmvt is doing correct? 2. Is this what it is supposed to be doing? 3. Is there any way of deriving the densities that I want in R (short of writing my own function...)? Thanks Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Number of occurences of a character in a string
How about: sum(unlist(strsplit(b,NULL))==;) # [1] 5 (More transparent, at least to me ... ). See '?strsplit', and note what is said under Value. Ted. On 11-Oct-10 04:35:43, Michael Sumner wrote: Literally: length( gregexpr(;, b)[[1]]) But more generally, in case b has more than one element: sapply(gregexpr(;, b), length) ?gregexpr On Mon, Oct 11, 2010 at 3:18 PM, Santosh Srinivas santosh.srini...@gmail.com wrote: New to R ... which is a function to most effectively search the number of occurrences of a character in a string? b - c(jkhrikujhj345hi5hiklfjsdkljfksdio324j';;'lfd;g'lkfit34'5;435l;43'5k ) I want the number of semi-colons ; in b? Thanks. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 11-Oct-10 Time: 09:20:36 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dataset Transformation
I need to transpose the following input dataset into an output dataset like below Input Date TICKER Price 11/10/2010 A 0.991642 11/10/2010 B 0.475023 11/10/2010 C 0.218642 11/10/2010 D 0.365135 12/10/2010 A 0.687873 12/10/2010 B 0.47006 12/10/2010 C 0.533542 12/10/2010 D 0.812439 13/10/2010 A 0.210848 13/10/2010 B 0.699799 13/10/2010 C 0.546003 13/10/2010 D 0.152316 Output needed Date A B C D 11/10/2010 0.991642 0.475023 0.218642 0.365135 12/10/2010 0.687873 0.47006 0.533542 0.812439 13/10/2010 0.210848 0.699799 0.546003 0.152316 I tried using the aggregate function but not quite getting the method. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split rows depending on time frame
Hi, I have the following data frame, where col2 is a startdate and col3 an enddate COL1 COL2 COL3 A 4046240482 B 4046240478 The above timeframe of 3 weeks I would like to splits it in weeks like this COL1 COL2 COL3 COL4 A 40462404681 A 40469404751 A 40476404821 B 40462404681 B 40469404751 B 40476404780.428 Where COL4 is an identifier if the timeframe between COL2 and COL3 is exactly 7 days or shorter. In the example above for B the last split contains only 3 days so the value in COL 4 is 3/7 I can't figure out to do the above. Is there someone who can help me out? Thx in advance, Bert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split rows depending on time frame
Dear Bert, Use the plyr package to do the magic library(plyr) dataset - data.frame(COL1 = c(A, B), COL2 = 40462, COL3 = c(40482, 40478)) tmp - ddply(dataset, COL1, function(x){ delta - with(x, 1 + COL3 - COL2) rows - rep(1, delta %/% 7) if(delta %% 7 0){ rows - c(rows, (delta %% 7) / 7) } data.frame(COL4 = rows) }) merge(dataset, tmp) HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Bert Jacobs Verzonden: maandag 11 oktober 2010 11:26 Aan: r-help@r-project.org Onderwerp: [R] Split rows depending on time frame Hi, I have the following data frame, where col2 is a startdate and col3 an enddate COL1 COL2 COL3 A 4046240482 B 4046240478 The above timeframe of 3 weeks I would like to splits it in weeks like this COL1 COL2 COL3 COL4 A 40462404681 A 40469404751 A 40476404821 B 40462404681 B 40469404751 B 40476404780.428 Where COL4 is an identifier if the timeframe between COL2 and COL3 is exactly 7 days or shorter. In the example above for B the last split contains only 3 days so the value in COL 4 is 3/7 I can't figure out to do the above. Is there someone who can help me out? Thx in advance, Bert [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split rows depending on time frame
On Mon, Oct 11, 2010 at 5:25 AM, Bert Jacobs bert.jac...@figurestofacts.be wrote: Hi, I have the following data frame, where col2 is a startdate and col3 an enddate COL1 COL2 COL3 A 40462 40482 B 40462 40478 The above timeframe of 3 weeks I would like to splits it in weeks like this COL1 COL2 COL3 COL4 A 40462 40468 1 A 40469 40475 1 A 40476 40482 1 B 40462 40468 1 B 40469 40475 1 B 40476 40478 0.428 Where COL4 is an identifier if the timeframe between COL2 and COL3 is exactly 7 days or shorter. In the example above for B the last split contains only 3 days so the value in COL 4 is 3/7 Try this: DF - data.frame(COL1 = c(A, B), COL2 = 40462, COL3 = c(40482, 40478)) do.call(rbind, by(DF, DF$COL1, function(x) with(x, { COL2 - seq(COL2, COL3, 7) COL3 - pmin(COL2 + 6, COL3) COL4 - (COL3 - COL2 + 1) / 7 data.frame(COL1, COL2, COL3, COL4) }))) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting Zipf and Zipf-Mandelbrot curves in R
Using R, I plotted a log-log plot of the frequencies in the Brown Corpus using plot(sort(file.tfl$f, decreasing=TRUE), xlab=rank, ylab=frequency, log=x,y) However, I would also like to add lines showing the curves for a Zipfian distribution and for Zipf-Mandelbrot. I have seen these in many articles that used R in creating graphs. Thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clustering with cosine correlation
Dear All Do you know how to make a heatmap and use cosine correlation for clustering? This is what my colleague can do in gene-math and I want to do in R but I don't know how to. Thanks a lot Leila __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mapping the coordinates!
Hi Mehdi, Take a look at the spatial task view [1] and the r-sig-geo mailing [2] list. cheers, Paul [1] http://cran.r-project.org/web/views/Spatial.html [2] https://stat.ethz.ch/mailman/listinfo/r-sig-geo On 10/10/2010 06:11 AM, Mehdi Zarrei wrote: Hello, I have a series of coordinates (latitudes and longitudes) each one/several associated to a code (from 1 to 28). I used function points (latitude, longitudes) to transfer them to a per-prepared map. 1- I wonder how I might be able to automatically add codes (1-28) to the map too? 2-Moreover, mostly there are a few codes from the identical coordinates. What is the function to avoid overlapping of codes on the map? 3- I want to draw closed line around some geographical areas to define the habitats. Your help in any way (introducing manuals, codes, etc) is appreciated. All the best, Mehdi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 253 5773 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boundary correction for kernel density estimation
Dear R-users, I have the following problem: I would like to estimate the density curve for univariate data between 0 and 1. Unfortunately, the density function in the stats package is just capable to cut the curve at a left and a right-most point. This truncation would lead to an underestimation. An overspill of the bounded support is unappropriate as well. Do anyone knows a boundary correction method implmented in R? I did much research but the correction methods I found are regarding survival or spatial data. Thanks a lot for any hint! Cheers, Katja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (senza oggetto)
I have the y-axe in a grafich that has as extreme limit 0.00 and 1.50. plot gives me the interval 0.0, 0.5,1.0,1.5 but I want: 0.00,0.15,0.30 and so on with 2 decimals. How can I do? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (senza oggetto)
Hi, Try this: plot(seq(0,1.5,0.1), yaxt=n) axis(2, at=seq(0.00,1.50, 0.15)) To understand read ?par (especially the yaxt argument in that case, but I guess you need to know more about that) and ?axis HTH, Ivan Le 10/11/2010 14:07, barbara.r...@uniroma1.it a écrit : I have the y-axe in a grafich that has as extreme limit 0.00 and 1.50. plot gives me the interval 0.0, 0.5,1.0,1.5 but I want: 0.00,0.15,0.30 and so on with 2 decimals. How can I do? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] conditioning variables in dbRDA
I am using cascaple() in vegan, is it permissible to have more than one conditioning variable thus capscale(DIST~varaible1+variable2+Conditon(varaible3+variable4), data=mydata) many thanks Nevil Amos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (senza oggetto)
On 11-Oct-10 12:07:43, barbara.r...@uniroma1.it wrote: I have the y-axe in a grafich that has as extreme limit 0.00 and 1.50. plot gives me the interval 0.0, 0.5,1.0,1.5 but I want: 0.00,0.15,0.30 and so on with 2 decimals. How can I do? Thanks [[alternative HTML version deleted]] __ The key to this is the plot() parameter yaxp [see below]. So, for instance, x - (0:5) y - 1.5*(x^2)/(5^2) ## y-values range from 0 to 1.5 plot(x, y, ylim=c(0,1.5), yaxp=c(0,1.5,10)) ## And compare with: plot(x, y, ylim=c(0,1.5)) The explanation of 'yaxp' (and its x-friend xaxp) can be found is '?par'. The full details are under 'xaxp': 'xaxp' A vector of the form 'c(x1, x2, n)' giving the coordinates of the extreme tick marks and the number of intervals between tick-marks when 'par(xlog)' is false. [...] As explained under 'yaxp', this is constructed in the same way as 'xaxp. So you have y ranging from 0 to 1.5 by steps of 0.15, hence a total of 10 intervals, therefore yaxp = c(0,1.5,10) It is unfortunate that the documentation for the large number of parameters and features for graphics, even for the basic plot() function (which will be every beginner's starting point) is fragmented over many different documentation entries. If you start with '?plot', not yet knowing where you should be looking, it could take you several tries in different places before you find what you want! Hoping this helps, Ted. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 11-Oct-10 Time: 13:51:24 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditioning variables in dbRDA
On Mon, 2010-10-11 at 23:42 +1100, Nevil Amos wrote: I am using cascaple() in vegan, is it permissible to have more than one conditioning variable thus capscale(DIST~varaible1+variable2+Conditon(varaible3+variable4), data=mydata) Yes. Have you tried it and had problems doing it? Or is this just a request for clarification? It does work: data(varespec) capscale(varespec ~ Ca + Condition(pH + P), varechem) Call: capscale(formula = varespec ~ Ca + Condition(pH + P), data = varechem) Inertia Rank Total1826 Conditional 4282 Constrained 1301 Unconstrained1268 20 Inertia is mean squared Euclidean distance Eigenvalues for constrained axes: CAP1 130.0 Eigenvalues for unconstrained axes: MDS1 MDS2 MDS3 MDS4 MDS5 MDS6 MDS7 MDS8 739.77 226.54 89.04 70.61 36.72 30.47 24.77 18.64 (Showed only 8 of all 20 unconstrained eigenvalues) HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question related to multiple regression
SNN s.nancy1 at yahoo.com writes: I am conducting an association analysis of genotype and a phenotype such as cholesterol level as an outcome and the genotype as a regressor using multiple linear regression. There are 3 possibilities for the genotype AA, AG, GG. There are 5 people with the AA genotype, 100 with the AG genotype and 900 with the GG genotype. I coded GG genotype as 1, AG as 2 and AA as 3 and the p-value for the genotype is significant. Should I believe this p-value or not? My concern is that there are not may samples with the AA genotype and could that have effected the significance of the genotype in the model? Make sure that R is treating genotype as a factor, not a continuous covariate -- for that reason it's better *not* to recode genotypes as integer codes, which increases the chance of this type of confusion. Unless you really have reason to believe that the difference in expected cholesterol level is linearly related to the number of A alleles -- i.e. b_0 for GG b_0+d for AG b_0+2*d for AA this seems like a fairly strong assumption to make ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditioning variables in dbRDA
On 11/10/10 15:42 PM, Nevil Amos nevil.a...@gmail.com wrote: I am using cascaple() in vegan, is it permissible to have more than one conditioning variable thus capscale(DIST~varaible1+variable2+Conditon(varaible3+variable4), data=mydata) Nevil, Yes, it is permissible. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataset Transformation
Repost .. since the previous msg had problems I need to transpose the following input dataset into an output dataset like below Input DateTICKER Price 11/10/2010 A 0.991642 11/10/2010 B 0.475023 11/10/2010 C 0.218642 11/10/2010 D 0.365135 12/10/2010 A 0.687873 12/10/2010 B 0.47006 12/10/2010 C 0.533542 12/10/2010 D 0.812439 13/10/2010 A 0.210848 13/10/2010 B 0.699799 13/10/2010 C 0.546003 13/10/2010 D 0.152316 Output needed DateA B C D 11/10/2010 0.9916420.4750230.2186420.365135 12/10/2010 0.6878730.47006 0.5335420.812439 13/10/2010 0.2108480.6997990.5460030.152316 I tried using the aggregate function but not quite getting the method. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] venneuler (java?) color palette 0 - 1
Hi Paul, That's pretty much awesome. Thank you very much. And combined with the colorspace package functions- rainbow_hcl() and sequential_hcl() -make color selection easy. One thing i was digging for was a function that yields a color palette *and* the hcl() call needed to produce it. This would help me better understand the hcl format. So where i can get the RGB codes like this- rainbow_hcl(4) [1] #E495A5 #ABB065 #39BEB1 #ACA4E2 - which is fine for color specification, is there a palette function that might help obtain the hcl() call needed to produce a given palette? ie., the 'h', 'c' and 'l' (and 'alpha' if appropriate) values for a given color/shade?? Thanks again and in advance for any further pointers, Karl On 10/10/2010 10:41 PM, Paul Murrell wrote: Hi On 11/10/2010 9:01 a.m., Karl Brand wrote: Dear UseRs and DevelopeRs It would be helpful to see the color palette available in the venneuler() function. The relevant par of ?venneuler states: colors: colors of the circles as values between 0 and 1 -which explains color specification, but from what pallette? Short of trial and error, i'd really appreciate if some one could help me locate a 0 - 1 pallette for this function to aid with color selection. The color spec stored in the VennDiagram object is multiplied by 360 to give the hue component of an hcl() colour specification. For example, 0.5 would mean the colour hcl(0.5*360, 130, 60) Alternatively, you can control the colours when you call plot, for example, ... plot(ve, col=c(red, green, blue)) ... should work. Paul FWIW, i tried the below code and received the displayed error. I failed to turn up any solutions to this error... Any suggestions appreciated, Karl library(venneuler) ve- venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1)) class(ve) [1] VennDiagram ve$colors- c(red, green, blue) plot(ve) Error in col * 360 : non-numeric argument to binary operator -- Karl Brand k.br...@erasmusmc.nl Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam P +31 (0)10 704 3409 | F +31 (0)10 704 4743 | M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataset Transformation
try this: x - read.table(textConnection(DateTICKER Price + 11/10/2010 A 0.991642 + 11/10/2010 B 0.475023 + 11/10/2010 C 0.218642 + 11/10/2010 D 0.365135 + 12/10/2010 A 0.687873 + 12/10/2010 B 0.47006 + 12/10/2010 C 0.533542 + 12/10/2010 D 0.812439 + 13/10/2010 A 0.210848 + 13/10/2010 B 0.699799 + 13/10/2010 C 0.546003 + 13/10/2010 D 0.152316), header = TRUE, as.is = TRUE) closeAllConnections() x.m - melt(x) Using Date, TICKER as id variables cast(x.m, Date ~ TICKER) DateABCD 1 11/10/2010 0.991642 0.475023 0.218642 0.365135 2 12/10/2010 0.687873 0.470060 0.533542 0.812439 3 13/10/2010 0.210848 0.699799 0.546003 0.152316 On Mon, Oct 11, 2010 at 9:35 AM, Santosh Srinivas santosh.srini...@gmail.com wrote: Repost .. since the previous msg had problems I need to transpose the following input dataset into an output dataset like below Input Date TICKER Price 11/10/2010 A 0.991642 11/10/2010 B 0.475023 11/10/2010 C 0.218642 11/10/2010 D 0.365135 12/10/2010 A 0.687873 12/10/2010 B 0.47006 12/10/2010 C 0.533542 12/10/2010 D 0.812439 13/10/2010 A 0.210848 13/10/2010 B 0.699799 13/10/2010 C 0.546003 13/10/2010 D 0.152316 Output needed Date A B C D 11/10/2010 0.991642 0.475023 0.218642 0.365135 12/10/2010 0.687873 0.47006 0.533542 0.812439 13/10/2010 0.210848 0.699799 0.546003 0.152316 I tried using the aggregate function but not quite getting the method. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficiency Question - Nested lapply or nested for loop
Thank you both for your advice. I ended up implementing both solutions and testing them on a real dataset of 10,000 rows and 50 inds. The results are very, very interesting. For some context, the original two approaches, nested lapply and nested for loops, performed at 1.501529 and 1.458963 mins, respectively. So the for loops were indeed a bit faster. Next, I tried the index solution to avoid doing the paste command each iteration. Strangely, this increased the time to 2.83 minutes. Here's how I implemented it: # create array of column idx v = vector(mode=character,length=nind*4) for (i in (0:(nind-1))) { v[(i*4+1):(i*4+4)] = c(paste(G_hat_0_,i,sep=), paste(G_hat_1_,i,sep=), paste(G_hat_2_,i,sep=), paste(G_,i,sep=)) } v = match(v,names(data)) for (row in (1:nrow(data))) { for (i in (0:(nind-1))) { Gmax = which.max(c( data[row,v[i*4+1]], data[row,v[i*4+2]], data[row,v[i*4+3]] )) Gtru = data[row,v[i*4+4]] + 1 # add 1 to match Gmax range cmat[Gmax,Gtru] = cmat[Gmax,Gtru] + 1 } } DAVID: Was this what you had in mind? I had trouble implementing the vector of indices as you had done. It generated a bunch of warnings. By far the best solution was that offered by Gabor. His technique finished the job in a whopping 9.8 SECONDS. It took me about 15 minutes to understand what it was doing, but the lesson is one I will never forget. I must admit, it was a wickedly clever solution. I implemented it virtually identically to Gabor's example. The only difference is that I used the 'v' vector to subset the data frame because in reality the data has many other unrelated columns. mat - matrix(t(data[v]), 4) table(Gmax = apply(mat[-4,], 2, which.max), Gtru = mat[4,] + 1) -- View this message in context: http://r.789695.n4.nabble.com/Efficiency-Question-Nested-lapply-or-nested-for-loop-tp2968553p2989822.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] venneuler (java?) color palette 0 - 1
On Mon, 11 Oct 2010, Karl Brand wrote: Hi Paul, That's pretty much awesome. Thank you very much. And combined with the colorspace package functions- rainbow_hcl() and sequential_hcl() -make color selection easy. One thing i was digging for was a function that yields a color palette *and* the hcl() call needed to produce it. This would help me better understand the hcl format. So where i can get the RGB codes like this- rainbow_hcl(4) [1] #E495A5 #ABB065 #39BEB1 #ACA4E2 - which is fine for color specification, is there a palette function that might help obtain the hcl() call needed to produce a given palette? ie., the 'h', 'c' and 'l' (and 'alpha' if appropriate) values for a given color/shade?? The ideas underlying rainbow_hcl(), sequential_hcl(), and diverge_hcl() are described in the following paper Achim Zeileis, Kurt Hornik, Paul Murrell (2009). Escaping RGBland: Selecting Colors for Statistical Graphics. Computational Statistics Data Analysis, 53(9), 3259-3270. doi:10.1016/j.csda.2008.11.033 A preprint PDF version of it is also available for download on my webpage. In the paper you see how the HCL coordinates for the different palettes are constructed. The functions rainbow_hcl(), sequential_hcl(), and diverge_hcl() are all direct translations of this, consisting just of a few lines of code. What may be somewhat confusing is that the functions call hex(polarLUV(L, C, H, ...)) instead of hcl(H, C, L, ...) which may yield slightly different results. The reason for this is that the polarLUV() implementation in colorspace predates the base R implementation in hcl(). hth, Z Thanks again and in advance for any further pointers, Karl On 10/10/2010 10:41 PM, Paul Murrell wrote: Hi On 11/10/2010 9:01 a.m., Karl Brand wrote: Dear UseRs and DevelopeRs It would be helpful to see the color palette available in the venneuler() function. The relevant par of ?venneuler states: colors: colors of the circles as values between 0 and 1 -which explains color specification, but from what pallette? Short of trial and error, i'd really appreciate if some one could help me locate a 0 - 1 pallette for this function to aid with color selection. The color spec stored in the VennDiagram object is multiplied by 360 to give the hue component of an hcl() colour specification. For example, 0.5 would mean the colour hcl(0.5*360, 130, 60) Alternatively, you can control the colours when you call plot, for example, ... plot(ve, col=c(red, green, blue)) ... should work. Paul FWIW, i tried the below code and received the displayed error. I failed to turn up any solutions to this error... Any suggestions appreciated, Karl library(venneuler) ve- venneuler(c(A=1, B=2, C=3, AC=0.5, ABC=0.1)) class(ve) [1] VennDiagram ve$colors- c(red, green, blue) plot(ve) Error in col * 360 : non-numeric argument to binary operator -- Karl Brand k.br...@erasmusmc.nl Department of Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam P +31 (0)10 704 3409 | F +31 (0)10 704 4743 | M +31 (0)642 777 268 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] filled.contour: colour key decoupled from main plot?
Dear R colleagues, I am trying to plot some geophysical data as a filled contour on a continent map and so far the guidance from the R-help archives has been invaluable. The only bit that still eludes me is the colour key (legend) coming with filled.contour: I prefer to generate my own colour palette, mainly based on the quantiles of tenths of the data in order to capture the whole range (of rainfall for example), including the more extreme values both sides. In the colour key this results in uneven distribution of the colour bars (and I understood why). Here is the code with simplistic data: xlon - seq(10, 60, len=10) ylat - seq(20, 50, len=10) prcp - abs(rnorm(length(xlon)*length(ylat)))*1000 zprcp - array(zprcp,c(length(xlon),length(ylat))) zprcp.colour -c(#EDFFD2,#00FFD2,#00F0FF,#00B4FF,#0078FF,#003CFF,#FF,#3C00FF,#7800FF,#B400FF,#FF0096) zprcp.quants - rev(quantile(zprcp,na.rm=T,probs=c(1,0.98,0.9,0.8,0.7,0.6,0.5,0.4,0.3,0.2,0.1))) zprcp.breaks -c(0,10*ceiling(zprcp.quants/10)) filled.contour(xlon,ylat,zprcp,ylim=c(20,50),xlim=c(10,60), asp=1.0, plot.axes=map('worldHires',xlim=c(10,60),ylim=c(20,50), border=0.9, add =TRUE),levels=zprcp.breaks,col=zprcp.colour,key.axes = axis(4,zprcp.breaks)) I would like the colour bars to be even (and the labels to represent the actual quantile values). I tried to modify the key.axes=axis(..) to force an evenly spaced colour key (and keeping the same colours) but it seems that this ultimately obeys the 'levels' and 'col' parameters already defined, which are also used for the main image. I have also tried to decouple the 'levels' and 'col' settings between the main plot and the legend by fiddling with the filled.contour function but without success yet. I would be grateful for any ideas, ideally based on the basic graphics package. Thanks, Panos --- Dr Panos Hadjinicolaou Energy Environment Water Research Center (EEWRC) The Cyprus Institute __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dataset Transformation
On Mon, Oct 11, 2010 at 9:35 AM, Santosh Srinivas santosh.srini...@gmail.com wrote: Repost .. since the previous msg had problems I need to transpose the following input dataset into an output dataset like below Input Date TICKER Price 11/10/2010 A 0.991642 11/10/2010 B 0.475023 11/10/2010 C 0.218642 11/10/2010 D 0.365135 12/10/2010 A 0.687873 12/10/2010 B 0.47006 12/10/2010 C 0.533542 12/10/2010 D 0.812439 13/10/2010 A 0.210848 13/10/2010 B 0.699799 13/10/2010 C 0.546003 13/10/2010 D 0.152316 Output needed Date A B C D 11/10/2010 0.991642 0.475023 0.218642 0.365135 12/10/2010 0.687873 0.47006 0.533542 0.812439 13/10/2010 0.210848 0.699799 0.546003 0.152316 I tried using the aggregate function but not quite getting the method. 1. Try this: Lines - DateTICKER Price 11/10/2010 A 0.991642 11/10/2010 B 0.475023 11/10/2010 C 0.218642 11/10/2010 D 0.365135 12/10/2010 A 0.687873 12/10/2010 B 0.47006 12/10/2010 C 0.533542 12/10/2010 D 0.812439 13/10/2010 A 0.210848 13/10/2010 B 0.699799 13/10/2010 C 0.546003 13/10/2010 D 0.152316 DF - read.table(textConnection(Lines), header = TRUE) DF$Date - as.Date(DF$Date,%d/%m/%Y) DFout - reshape(DF, dir = wide, timevar = TICKER, idvar = Date) names(DFout) - sub(Price., , names(DFout)) 2. or using read.zoo in the zoo package we can read it in and reshape it all at once: library(zoo) z - read.zoo(textConnection(Lines), header = TRUE, split = 2, format = %d/%m/%Y) At this point z is a zoo object in wide format: z ABCD 2010-10-11 0.991642 0.475023 0.218642 0.365135 2010-10-12 0.687873 0.470060 0.533542 0.812439 2010-10-13 0.210848 0.699799 0.546003 0.152316 Since this is a multivariate time series you might want to just leave it as a zoo object since you then get all of the facilities of zoo, e.g. plot(z) # multi-panel plot(z, screen = 1) # all in one panel but if you want it as a data frame then convert it like this: data.frame(Index = index(z), coredata(z)) IndexABCD 1 2010-10-11 0.991642 0.475023 0.218642 0.365135 2 2010-10-12 0.687873 0.470060 0.533542 0.812439 3 2010-10-13 0.210848 0.699799 0.546003 0.152316 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble accessing cov function from stats library
Dear all I am trying to use the cov function in the stats library. I have no problem using this function from the console. However, in my R script I received a function not found message. Then I called stats::cov(...) and received an error message that the function was not exported. Then I tried stats:::cov (three colons) and received the error Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'Cov' not found I am also importing the ltm library, though I'm not aware of a cov function in ltm that could be causing a conflict. Any suggestions? Thanks Barth PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with simple but massive data transformation
I have data that looks like this: start end value 1 4 2 5 8 1 9 100 I want to transform the data so that it becomes: startend value 1 2 2 2 3 2 4 2 5 1 6 1 7 1 8 1 9 0 10 0 I've written a for loop that can do the transformation BUT I need to do this on very large datasets (millions of rows). Does anyone know of an R package that has a function that can do this transformation? Any help is much appreciated! Thanks! -- View this message in context: http://r.789695.n4.nabble.com/help-with-simple-but-massive-data-transformation-tp2989850p2989850.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble accessing cov function from stats library
On Oct 11, 2010, at 10:27 AM, Barth B. Riley wrote: Dear all I am trying to use the cov function in the stats library. I have no problem using this function from the console. However, in my R script I received a function not found message. Then I called stats::cov(...) and received an error message that the function was not exported. Then I tried stats:::cov (three colons) and received the error Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'Cov' not found You are misspelling it. R is case-sensitive. I am also importing the ltm library, though I'm not aware of a cov function in ltm that could be causing a conflict. Any suggestions? Thanks Barth David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get Mean rank for Kruskal-Wallis Test
Hello All, I want Ranks' Table in R as like in SPSS ouput in the given link. http://www.statisticssolutions.com/methods-chapter/statistical-tests/kruskal-wallis-test/ Is the code is already available? Please let me know. Thanks, Lawrence __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] OT: snow socket clusters with VirtualBox and VMware player (Linux host, Win guest)
Dear All, I am trying to create socket clusters (using snow and snowfall) with a Windows OS. I am running Windows inside VirtualBox and VMware player (i.e., Windows is guest) from a Debian Linux host system (I've tried in two different Linux systems, an AMD x86-64 workstation and an Intel i686 laptop). However, almost always seting up the cluster fails: either R will hang forever or I will get messages such as in socketConnect [...] port 10187 cannot be opened Error in sfInit [...] Starting of snow cluster failed! Error in sockectConnection In fact, a command such as socketConnection(port = 10187, server = TRUE) hangs forever. This happens with R-2.11.1 and the current R-devel. For both VirtualBox and VMware Player I am running the latest available versions. As far as I can tell, there is no firewall in the Windows machines, and the firewall in the Linux machines is definitely down now. From time to time, the cluster gets created, but I have no idea why it succeeds (as far as I can tell, there is nothing different). I guess this is some sort of strange interaction between Windows and the virtualization, and R probably has little to do in this. However, has anybody been able to run snow and/or snowfall with socket clusters in a similar setup? Best, R. -- Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019 Fax: +-34-91-224-6972 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with simple but massive data transformation
This should be easy with apply() do.call(rbind, apply(dataset, 1, function(x){ list(data.frame(startend = x[1]:x[2], value = x[3]) })) Untested! ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens clee Verzonden: maandag 11 oktober 2010 16:17 Aan: r-help@r-project.org Onderwerp: [R] help with simple but massive data transformation I have data that looks like this: start end value 1 4 2 5 8 1 9 100 I want to transform the data so that it becomes: startend value 1 2 2 2 3 2 4 2 5 1 6 1 7 1 8 1 9 0 10 0 I've written a for loop that can do the transformation BUT I need to do this on very large datasets (millions of rows). Does anyone know of an R package that has a function that can do this transformation? Any help is much appreciated! Thanks! -- View this message in context: http://r.789695.n4.nabble.com/help-with-simple-but-massive-dat a-transformation-tp2989850p2989850.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with simple but massive data transformation
On Mon, Oct 11, 2010 at 10:16 AM, clee cheel...@gmail.com wrote: I have data that looks like this: start end value 1 4 2 5 8 1 9 10 0 I want to transform the data so that it becomes: startend value 1 2 2 2 3 2 4 2 5 1 6 1 7 1 8 1 9 0 10 0 I've written a for loop that can do the transformation BUT I need to do this on very large datasets (millions of rows). Does anyone know of an R package that has a function that can do this transformation? A very similar question was just asked recently. See this: https://stat.ethz.ch/pipermail/r-help/2010-October/255791.html -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with simple but massive data transformation
On Oct 11, 2010, at 10:16 AM, clee wrote: I have data that looks like this: start end value 1 4 2 5 8 1 9 100 I want to transform the data so that it becomes: startend value 1 2 2 2 3 2 4 2 5 1 6 1 7 1 8 1 9 0 10 0 do.call(rbind, apply(dta, 1, function(.r) matrix(c( seq(.r[1], .r[2]), vals=rep(.r[3], .r[2]-.r[1]+1) ), ncol=2) )) [,1] [,2] [1,]12 [2,]22 [3,]32 [4,]42 [5,]51 [6,]61 [7,]71 [8,]81 [9,]90 [10,] 100 I've written a for loop that can do the transformation BUT I need to do this on very large datasets (millions of rows). Does anyone know of an R package that has a function that can do this transformation? Any help is much appreciated! Thanks! -- View this message in context: http://r.789695.n4.nabble.com/help-with-simple-but-massive-data-transformation-tp2989850p2989850.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Heatmap/Color Selection(Key)
Hi Rashid, you may have a look at the colorRampPalette-function, along with the at argument oh heatmap.2 x-matrix(runif(100,-6,6),nrow=10) heatmap.2(x,col=colorRampPalette(c(blue,lightblue,darkgray,darkgray,yellow,red),space=Lab),at=c(-6.01,6.01,51)) # or just using the colors you posted heatmap.2(x,col=c(blue,lightblue,darkgray,black,darkgray,yellow,red),at=-3:3*2) hth. Am 08.10.2010 17:03, schrieb rashid kazmi: Hi I made heatmap of QTL based on Lod score. Where I have traits in columns and marker data (rows). I can not cluster both column and rows as I need the right order for marker data. Can someone suggest me better way of generating heatmaps especially the colour key I want to select to visualize the results which are more interesting to look at. library(gplots) sample=read.csv(file.choose()) sample.names-sample[,1] sample.set-sample[,-1] sample.map - as.matrix(sample.set) ### have to order as i have markers on rows so just want denrogram on triats(column) ord - order(rowSums(abs(sample.map)),decreasing=T) heatmap.2(sample.map[ord,],Rowv=F,dendrogram=column,trace=none,col=greenred(10)) But I want to give colours more specifically as I want to show the QTL hotspots starting as fallow 1) -6 to -4 (blue) 2) -4 to -2 (light blue) 3) -2 to 0 (dark grey or black) 4) 0 to 2 (dark grey or black) 5) 2 to 4 (yellow) 6) 4 to 6 (red) Any help or some addition to the above mentioned R code would be appreciated. R Kazmi PhD The Netheralnd [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Heatmap/Color Selection(Key)
sorry, typo: heatmap.2(x,col=colorRampPalette(c(blue,lightblue,darkgray,darkgray,yellow,red),space=Lab),at=c(-6.01,6.01,51)) heatmap.2(x,col=c(blue,lightblue,darkgray,black,darkgray,yellow,red),at=-3:3*2) should be read as heatmap.2(x,col=colorRampPalette(c(blue,lightblue,darkgray,darkgray,yellow,red)),breaks=seq(-6.01,6.01,length.out=51)) heatmap.2(x,col=c(blue,lightblue,darkgray,darkgray,yellow,red),breaks=-3:3*2) at from stats:heatmap became breaks in gplots:heatmap.2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get Mean rank for Kruskal-Wallis Test
On Oct 11, 2010, at 9:43 AM, Lawrence wrote: Hello All, I want Ranks' Table in R as like in SPSS ouput in the given link. http://www.statisticssolutions.com/methods-chapter/statistical-tests/kruskal-wallis-test/ Is the code is already available? Please let me know. Yes. All code is available: ??Kruskal Wallis methods(kruskal.test) getAnywhere(kruskal.test.default) If you want to extract the table, then looking at the bottom of that function, you see that the variables r, g,and x have been created and you would need to modify that code and substitute the returned table you desire. The table might be a bit more complicated than that simplistic offering since ties need to be properly accounted for if you intend to replicate the results by hand. Thanks, Lawrence -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MATLAB vrs. R
alain, Perhaps i'm still entering the code wrong. I tried using your result=myquadrature(f,0,2000) print(result) Instead of my: val = myquadrature(f,a,b) result=myquadrature(val,0,2000) print(result) ...and I am still getting an inf inf inf inf inf... Did you change any of the previous syntax in addition to changing the result statement? Thank you so much and I think my brain is fried! Happy Holiday. Craig Date: Mon, 11 Oct 2010 09:59:17 +0200 From: alain.guil...@uclouvain.be To: craigpoconn...@hotmail.com CC: pda...@gmail.com; r-help@r-project.org Subject: Re: [R] MATLAB vrs. R Hi, The first argument of myquadrature in result shouldn't be val but f I guess. At least it works for me result=myquadrature(f,0,2000) print(result) [1] 3 Regards, Alain On 11-Oct-10 09:37, Craig O'Connell wrote: Thank you Peter. That is very much helpful. If you don't mind, I continued running the code to attempt to get my answer and I continue to get inf inf inf... (printed around 100 times). Any assistance with this issue. Here is my code (including your corrections): myquadrature-function(f,a,b){ npts=length(f) nint=npts-1 if(npts=1) error('need at least two points to integrate') end; if(b=a) error('something wrong with the interval, b should be greater than a') else dx=b/real(nint) end; npts=length(f) int=0 int- sum(f[-npts]+f[-1])/2*dx } #Call my quadrature x=seq(0,2000,10) h = 10.*(cos(((2*pi)/2000)*(x-mean(x)))+1) u = 1.*(cos(((2*pi)/2000)*(x-mean(x)))+1) a = x[1] b = x[length(x)] plot(x,-h) a = x[1]; b = x[length(x)]; #call your quadrature function. Hint, the answer should be 3. f=u*h; val = myquadrature(f,a,b); ? ___This is where issue arises. result=myquadrature(val,0,2000) ? print(result) ? Thanks again, Phil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Université catholique de Louvain Bureau c.316 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nonlinear Regression Parameter Shared Across Multiple Data Sets
I'm working with 3 different data sets and applying this non-linear regression formula to each of them. nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers, start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5)) Previously, all of the regressions were calculated in Prism, but I'd like to be able to automate the calculation process in a script, which is why I'm trying to move to R. The issue I'm running into is that previously, in Prism, I was able to calculate a shared value for a constraint so that all three data sets shared the same value, but have other constraints calculated separately. So Prism would figure out what single value for the constraint in question would work best across all three data sets. For my formula, each data set needs it's own LOGEC50 value, but the upper value should be the same across the 3 sets. Is there a way to do this within R, or with a package I'm not aware of, or will I need to write my own nls function to work with multiple data sets, because I've got no idea where to start with that. Thanks, Jared [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how can i do anova
Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of R bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hausman test for endogeneity
... and, in fact, simply googling on R Package Hausmann finds two Hausmann test functions in 2 different packages within the first half dozen hits. -- Bert On Sat, Oct 9, 2010 at 11:06 AM, Liviu Andronic landronim...@gmail.com wrote: Hello On Sat, Oct 9, 2010 at 2:37 PM, Holger Steinmetz holger.steinm...@web.de wrote: can anybody point me in the right direction on how to conduct a hausman test for endogeneity in simultanous equation models? Try install.packages('sos') require(sos) findFn('hausman') Here I get these results: findFn('hausman') found 22 matches; retrieving 2 pages 2 Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can i do anova
Type ?anova on your R command line for the basic function, and links to related functions. Also, try a google search of something like doing anova in R and you should find multiple tutorials or examples. Andrew Miles On Oct 11, 2010, at 11:33 AM, Mauluda Akhtar wrote: Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of R bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with Cairo
Dear users, As an alternative to RSvgDevice::devSVG, I have tried using Cairo and cairoDevice. When opening the svg file from Cairo::CairoSVG() as well as from cairoDevice::Cairo_svg() in Illustrator, I got a warning message (which is damn hard to translate since I don't understand it), something like: clipping (?) will be lost at reexportation to format 'Tiny'. I then have a huge black square and some huge black numbers that I can remove. But if I do so, the axes labels are gone (I guess these huge numbers are the labels...). After having copied all necessary dll in cairoDevice\libs folder (I hope) to make it to load, I don't know what to do. Any ideas of what the problem could be? Thanks in advance! Ivan sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C [5] LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] cairoDevice_2.14 -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can i do anova
Hi Moon, Here is something to get you started. # Read Data into R dat - read.table(textConnection( bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50), header = TRUE) closeAllConnections() # Load the reshape package library(reshape) # Generally it is easier to use data in 'long' format # that is one column with values, another indicate what those are dat.long - melt(dat) # Look at the data dat.long # Another useful function that shows the structure of an object # You should see that 'variable' is a factor with six levels (one for each column) # and value is integer class, this is good str(dat.long) # Now to fit your model, we can use the aov() function # The formula specifies the DV on the left and the IVs on the right # or the outcome and predictors if you think of it that way # the data = argument tells aov() where to find the data # in this case in the dat.long variable model.aov - aov(value ~ variable, data = dat.long) # Now you can just print it as is model.aov # But you may also like the results of, summary() summary(model.aov) # If you're thinking about this from a regression point of view # Fit linear regression model (must like aov()) model.lm - lm(value ~ variable, data = dat.long) # Look at your model model.lm summary(model.lm) # It may seem very different at first, but now if you use the anova() # (Mind that it is a slightly different function than aov() ) # You can get the ANOVA source table from the regression model anova(model.lm) Hope that helps, Josh On Mon, Oct 11, 2010 at 8:33 AM, Mauluda Akhtar maulud...@gmail.com wrote: Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of R bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boundary correction for kernel density estimation
Look at the logspline package. It uses a different method from what density does, but it can take boundaries into account. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Katja Hebestreit Sent: Monday, October 11, 2010 3:04 AM To: r-help@r-project.org Subject: [R] Boundary correction for kernel density estimation Dear R-users, I have the following problem: I would like to estimate the density curve for univariate data between 0 and 1. Unfortunately, the density function in the stats package is just capable to cut the curve at a left and a right-most point. This truncation would lead to an underestimation. An overspill of the bounded support is unappropriate as well. Do anyone knows a boundary correction method implmented in R? I did much research but the correction methods I found are regarding survival or spatial data. Thanks a lot for any hint! Cheers, Katja __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can i do anova
Hello On Mon, Oct 11, 2010 at 5:33 PM, Mauluda Akhtar maulud...@gmail.com wrote: Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of R Try to do this with either Rcmdr or Deducer, both GUIs to R. Regards Liviu bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple comparison correction
dear list, i just found this post in the archive: On 23-Apr-05 Bill.Venables at csiro.au https://stat.ethz.ch/mailman/listinfo/r-help wrote: :* -Original Message-*:* From: r-help-bounces at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-help *:* [mailto:r-help-bounces at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-help] On Behalf Of *:* michael watson (IAH-C)*:* [...]*:* I have a highly significant interaction term. In the context*:* of the experiment, this makes sense. I can visualise the data *:* graphically, and sure enough I can see that both factors have*:* different effects on the data DEPENDING on what the value of*:* the other factor is. *:* *:* I explain this all to my colleague - and she asks but which*:* ones are different? This is best illustrated with an example.*:* We have either infected | uninfected, and vaccinated | unvaccinated*:* (the two factors).*:* We're measuring expression of a gene. Graphically, in the*:* infected group, vaccination makes expression go up. In the*:* uninfected group, vaccination makes expression go down. In*:* bo! th the vaccinated and unvaccinated groups, infection makes*:* expression go down, but it goes down further in unvaccinated*:* than it does in vaccinated.*:* *:* So from a statistical point of view, I can see exactly why*:* the interaction term is significant, but what my colleage*:* wants to know is that WITHIN the vaccinated group, does*:* infection decrease expression significantly? And within*:* the unvaccinated group, does infection decrease expression*:* significantly? Etc etc etc Can I get this information from*:* the output of the ANOVA, or do I carry out a separate*:* test on e.g. just the vaccinated group? (seems a cop out to me)** ** No, you can't get this kind of specific information out of the anova** table and yes, anova tables *are* a bit of a cop out. (I sometimes ** think they should only be allowed between consenting adults in** private.)* I think the cop out Michael Watson was referring to means going back to basics and doing a separate analysis on each group (though no doubt using the Res SS from the AoV table). Not that I disagree with your comment: I sometimes think that anova tables are often passed round between adults in order to induce consent which might otherwise have been withheld. * What you are asking for is a non-standard, but perfectly** reasonable partition of the degrees of freedom between the** classes of a single factor with four levels got by pairing** up the levels of vaccination and innoculation. Of course you** can get this information, but you have to do a bit of work** for it. * It seems to me that this is a wrapper for separate analysis on each group! * Before I give the example which I don't expect too many people** to read entirely, let me issue a little challenge, namely to** write tools to automate a generalized version of the procedure** below.* [technical setup snipped] * contrasts(dat$vac_inf) - ginv(m)** gm - aov(y ~ vac_inf, dat)** summary(gm)** Df Sum Sq Mean Sq F value Pr(F)** vac_inf 3 12.1294 4.0431 7.348 0.04190** Residuals4 2.2009 0.5502** ** This doesn't tell us too much other than there are differences,** probably. Now to specify the partition:** ** summary(gm, ** split = list(vac_inf = list(- vs +|N = 1, ** - vs +|Y = 2)))** Df Sum Sq Mean Sq F value Pr(F)** vac_inf 3 12.1294 4.0431 7.3480 0.04190** vac_inf: - vs +|N 1 7.9928 7.9928 14.5262 0.01892** vac_inf: - vs +|Y 1 3.7863 3.7863 6.8813 0.05860** Residuals4 2.2009 0.5502* Wow, Bill! Dazzling. This is like watching a rabbit hop into a hat, and fly out as a dove. I must study this syntax. But where can I find out about the split argument to summary? I've found the *function* split, whose effect is similar, but I've wandered around the summary, summary.lm etc. forest for a while without locating the *argument*. My naive (cop-out) approach would have been on the lines of (without setting up the contrast matrix): summary(aov(y~vac*inf,data=dat)) Df Sum Sq Mean Sq F value Pr(F) vac 1 0.3502 0.3502 0.6364 0.46968 inf 1 11.3908 11.3908 20.7017 0.01042 * vac:inf 1 0.3884 0.3884 0.7058 0.44812 Residuals4 2.2009 0.5502 so we get the 2.2009 on 4 df SS for redisuals with mean SS 0.5502. Then I would do: mNp-mean(y[(vac==N)(inf==+)]) mNm-mean(y[(vac==N)(inf==-)]) mYp-mean(y[(vac==Y)(inf==+)]) mYm-mean(y[(vac==Y)(inf==-)]) c( mYp, mYm, mNp, mNm ) ##[1] 2.4990492 0.5532018 2.5212655 -0.3058972 c(mYp-mYm, mNp-mNm ) ##[1] 1.945847 2.827163 after which: 1-pt(((mYp-mYm)/sqrt(0.5502)),4) ##[1] 0.02929801 1-pt(((mNp-mNm)/sqrt(0.5502)),4) ##[1] 0.009458266 give you 1-sided t-tests, and
[R] Is there a regression surface demo?
Hi All, Does anyone know of a function to plot a regression surface for two predictors? RSiteSearch()s and findFn()s have not turned up what I was looking for. I was thinking something along the lines of: http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif I like the rgl package because showing it from different angles is nice for demonstrations. I started to write my own, but it has some issues (non functioning code start below), and I figured before I tried to work out the kinks, I would ask for the list's feedback. Any comments or suggestions (about functions or preferred idioms for what I tried below, or...) are greatly appreciated. Josh RegSurfaceDemo - function(formula, data, xlim, ylim, zlim, resolution = 100) { require(rgl) ## This cannot be the proper way to extract variable names from formula vars - rownames(attr(terms(formula), factors)) ## if no limits set, make them nearest integer to ## .75 the lowest value and 1.25 the highest ranger - function(x) { as.integer(range(x) * c(.75, 1.25)) } if(is.null(xlim)) {xlim - ranger(data[, vars[2]])} if(is.null(ylim)) {ylim - ranger(data[, vars[3]])} if(is.null(zlim)) {zlim - ranger(data[, vars[1]])} ## This does not actually work because the data frame ## does not get named properly (actually it throws an error) ##f - function (x, y) { ## predict(my.model, newdata = data.frame(vars[2] = x, vars[3] = y)) ##} ## Fit model my.model - lm(formula = formula, data = data) ## Create X, Y, and Z grids X - seq(from = xlim[1], to = xlim[2], length.out = resolution) Y - seq(from = ylim[1], to = ylim[2], length.out = resolution) Z - outer(X, Y, f) ## Create 3d scatter plot and add the regression surface open3d() with(data = data, plot3d(x = vars[2], y = vars[3], z = vars[1], xlim = xlim, ylim = ylim, zlim = zlim)) par3d(ignoreExtent = TRUE) surface3d(X, Y, Z, col = blue, alpha = .6) par3d(ignoreExtent = FALSE) return(summary(my.model)) } -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question
Hello. I would be very grateful if you could help me in using R. I need R commands of pseudo random value and qvazi (quazi) random value. I found commands qnorm and pnorm, but I am not sure that this is the same as I am looking for. Looking forward to hearing from you. Thank you Margaret __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running own function in Java?
How do I run my own unique function in eclipse? -- View this message in context: http://r.789695.n4.nabble.com/running-own-function-in-Java-tp2990420p2990420.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LDA fuction
Hello, I wonder what analysis i have to use to evaluate which environmental variables most closely related to the grouping that I have. I has 38 streams are grouped based on eight environmental variables, but I wonder how these variables relate to these groups. Example.: PH, dissolved oxygen and altitude over which these variables relate to a group one, and two . . . Understood . my email is vasc...@gamil.com Thank you all. -- View this message in context: http://r.789695.n4.nabble.com/LDA-fuction-tp2990198p2990198.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep triggering error on unicode character
Colleagues, [R 2.11; OS X] I am processing a file on the fly that contains the following text: XXXáá [email clients may display this differently -- the string is three X's followed by two instances of the letter a with an acute accent] I read the file with: X - readLines(FILENAME) In this instance, the text of interest is on line 213. When I examine line 213, it reads: XXX\xe1\xe1 This makes sense because the unicode mapping for á [a-acute] is U+00E1. The problem arises when I attempt to manipulate the text in the file. For example: grep(XXX, X[213]) integer(0) Warning message: In grep(XXX, X[213]) : input string 1 is invalid in this locale Worse, yet: tolower(X[213]) Error in tolower(X[213]) : invalid multibyte string 1 I am focussing on resolving the first problem, i.e., identifying a line containing XXX. If I can do so, I can remove the offending lines before I execute the tolower command. However, I am stumped as to how to resolve either problem. Any help would be appreciated. Thanks. Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dot plot by group
Hi all, I have the folloing data table %% TypeBATCH RESPONSE SHORT A 22 SHORT A 3 SHORT A 16 SHORT A 14 SHORT A 8 SHORT A 27 SHORT A 11 SHORT A 17 SHORT B 12 SHORT B 17 SHORT B 11 SHORT B 10 SHORT B 16 SHORT B 18 SHORT B 15 SHORT B 13 SHORT B 9 SHORT B 20 SHORT C 4 SHORT C 16 SHORT C 32 SHORT C 11 SHORT C 9 SHORT C 25 SHORT C 27 SHORT C 12 SHORT C 26 SHORT C 7 SHORT C 14 LONGA 12 LONGA 7 LONGA 19 LONGA 19 LONGA 11 LONGA 33 LONGA 20 LONGA 25 LONGB 24 LONGB 6 LONGB 39 LONGB 14 LONGB 17 LONGB 10 LONGB 22 LONGB 35 LONGB 33 LONGB 21 LONGC 15 LONGC 11 LONGC 17 LONGC 8 LONGC 2 LONGC 10 LONGC 16 LONGC 21 LONGC 9 LONGC 19 LONGC 23 %% This is read into object 'd'. I produce the dot plot by, library(lattice) dotplot(BATCH~RESPONSE,data=d,groups=Type) How do I seperately plot them by 'Type'? I have tried using dotplot(BATCH~RESPONSE,data=d,groups=Type==SHORT) dotplot(BATCH~RESPONSE,data=d$Type=='SHORT') ect Thanks. Casper -- View this message in context: http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA fuction
On Mon, 2010-10-11 at 10:18 -0700, vascomc wrote: Hello, I wonder what analysis i have to use to evaluate which environmental variables most closely related to the grouping that I have. I has 38 streams are grouped based on eight environmental variables, but I wonder how these variables relate to these groups. Example.: PH, dissolved oxygen and altitude over which these variables relate to a group one, and two . . . Understood . my email is vasc...@gamil.com Thank you all. If you clustered the 38 streams on the basis of these eight env variables it seems a bit perverse to then ask how well these variables then separate the groups. For such a small sample, you might be best off computing summary statistics for the 8 variables conditioned on the groups (mean of Var1 in groups A,B,C,... etc), accompanied by some graphical plotting of the Variables (e.g box plots of VarX conditioned on group). Such an approach would presume you aren't trying to predict group membership from the 8 variables. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dot plot by group
Casper - I think you want dotplot(BATCH~RESPONSE,data=d,subset=Type=='SHORT') or dotplot(BATCH~RESPONSE,data=subset(d,Type=='SHORT')) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Mon, 11 Oct 2010, casperyc wrote: Hi all, I have the folloing data table %% TypeBATCH RESPONSE SHORT A 22 SHORT A 3 SHORT A 16 SHORT A 14 SHORT A 8 SHORT A 27 SHORT A 11 SHORT A 17 SHORT B 12 SHORT B 17 SHORT B 11 SHORT B 10 SHORT B 16 SHORT B 18 SHORT B 15 SHORT B 13 SHORT B 9 SHORT B 20 SHORT C 4 SHORT C 16 SHORT C 32 SHORT C 11 SHORT C 9 SHORT C 25 SHORT C 27 SHORT C 12 SHORT C 26 SHORT C 7 SHORT C 14 LONGA 12 LONGA 7 LONGA 19 LONGA 19 LONGA 11 LONGA 33 LONGA 20 LONGA 25 LONGB 24 LONGB 6 LONGB 39 LONGB 14 LONGB 17 LONGB 10 LONGB 22 LONGB 35 LONGB 33 LONGB 21 LONGC 15 LONGC 11 LONGC 17 LONGC 8 LONGC 2 LONGC 10 LONGC 16 LONGC 21 LONGC 9 LONGC 19 LONGC 23 %% This is read into object 'd'. I produce the dot plot by, library(lattice) dotplot(BATCH~RESPONSE,data=d,groups=Type) How do I seperately plot them by 'Type'? I have tried using dotplot(BATCH~RESPONSE,data=d,groups=Type==SHORT) dotplot(BATCH~RESPONSE,data=d$Type=='SHORT') ect Thanks. Casper -- View this message in context: http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990469.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a regression surface demo?
Dear Josh, On Mon, Oct 11, 2010 at 3:15 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi All, Does anyone know of a function to plot a regression surface for two predictors? RSiteSearch()s and findFn()s have not turned up what I was looking for. I was thinking something along the lines of: http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif I like the rgl package because showing it from different angles is nice for demonstrations. I started to write my own, but it has some issues (non functioning code start below), and I figured before I tried to work out the kinks, I would ask for the list's feedback. Any comments or suggestions (about functions or preferred idioms for what I tried below, or...) are greatly appreciated. Josh [snip] I haven't tried to debug your code, but wanted to mention that the Rcmdr:::scatter3d function does 3-d scatterplots (with the rgl package) and adds a regression surface, one of 4 or 5 different types. If nothing else, it might be a good place to start for making your own. A person can play around with the different types in the Rcmdr under the Graphs menu. Or, from the command line: library(Rcmdr) with(rock, scatter3d(area, peri, shape)) I hope that this helps, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dot plot by group
Hi Spector, Yes, that is exactly what I was aiming for. Thanks. Casper -- View this message in context: http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990495.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dot plot by group
And now I just wonder why the ' bty='n' ' won't work? I did dotplot(BATCH~RESPONSE,data=d,subset=Type=='SHORT',bty='n') and tried other bty parameters, none is working Casper -- View this message in context: http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990500.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grep triggering error on unicode character
On 11/10/2010 3:36 PM, Dennis Fisher wrote: Colleagues, [R 2.11; OS X] I am processing a file on the fly that contains the following text: XXXáá [email clients may display this differently -- the string is three X's followed by two instances of the letter a with an acute accent] I read the file with: X - readLines(FILENAME) In this instance, the text of interest is on line 213. When I examine line 213, it reads: XXX\xe1\xe1 This makes sense because the unicode mapping for á [a-acute] is U+00E1. That's not what it's saying: it's saying you have three X's followed by two unrecognized characters with hex codes E1. I imagine the original file is encoded using Latin1, because that's how á is encoded there. The problem arises when I attempt to manipulate the text in the file. For example: grep(XXX, X[213]) integer(0) Warning message: In grep(XXX, X[213]) : input string 1 is invalid in this locale Worse, yet: tolower(X[213]) Error in tolower(X[213]) : invalid multibyte string 1 I am focussing on resolving the first problem, i.e., identifying a line containing XXX. If I can do so, I can remove the offending lines before I execute the tolower command. However, I am stumped as to how to resolve either problem. Any help would be appreciated. You need to declare the encoding of the file when you read it if it's not in the default encoding for your locale, or re-encode it. See ?readLines. Duncan Murdoch Thanks. Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can i do anova
Dear Andrew Miles, Thanks a lot. Moon On Mon, Oct 11, 2010 at 9:54 PM, Andrew Miles rstuff.mi...@gmail.comwrote: Type ?anova on your R command line for the basic function, and links to related functions. Also, try a google search of something like doing anova in R and you should find multiple tutorials or examples. Andrew Miles On Oct 11, 2010, at 11:33 AM, Mauluda Akhtar wrote: Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of R bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can i do anova
Dear Liviu, Thanks a lot. moon On Tue, Oct 12, 2010 at 1:02 AM, Liviu Andronic landronim...@gmail.comwrote: Hello On Mon, Oct 11, 2010 at 5:33 PM, Mauluda Akhtar maulud...@gmail.com wrote: Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of R Try to do this with either Rcmdr or Deducer, both GUIs to R. Regards Liviu bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mailhttp://garbl.home.comcast.net/%7Egarbl/stylemanual/e.htm#e-mail [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] importing numeric types via sqlQuery
Hi everyone, I am using the sqlQuery function (in RODBC library) to import data from a database into R. My table (called temp) in the database looks like this: categorynumabc 54469517.307692307692def36428860.230769230769 I used the following R code to pull data into R:data -sqlQuery(channel, select category, num from temp;) However, the result is that num gets all its decimal places chopped off, so data looks like this instead in R:category numabc 54469517def 36428860 I've tried various alternative approaches, but none have fixed the problem. When I cast the variable to a numeric type like this (data -sqlQuery(channel, select category, num::numeric from temp;), it still gave me the same result. Casting to a real type like this (data -sqlQuery(channel, select category, num::real from temp;) resulted in scientific notation that also rounded the numbers. Any suggestions? Much appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spencer 15-point weighted moving average
I am trying to apply Spencer's 15-point weighted moving average filter to the time series shampoo, using the filter command, but I am not sure if I am using the filter correctly: library(fma) sma15 - c(-.009, -.019, -.016, .009, .066, .144, .209, .231, .209, .144, .066, .009, -.016, -.019, -.009) (s1 - filter(shampoo, sma15)) This result does not match the spence.15 command from package locfit library(locfit) spence.15(shampoo) Any help understanding why these are different (or what I am doing wrong with filter) would be appreciated. Thanks, Sam Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] importing numeric types via sqlQuery
I would assume that the digitis are not being chopped off. It is just that R will typically print data to 7 significant digits: x - 54469517.307692307692 x [1] 54469517 options(digits=20) x [1] 54469517.3076923 Your data it there and you can set 'options' to show it if you want to. Also with floating point, you will only get about 15 digits of accuracy (see FAQ 7.31). On Mon, Oct 11, 2010 at 4:19 PM, E C mmmraspberr...@hotmail.com wrote: Hi everyone, I am using the sqlQuery function (in RODBC library) to import data from a database into R. My table (called temp) in the database looks like this: category numabc 54469517.307692307692def 36428860.230769230769 I used the following R code to pull data into R:data -sqlQuery(channel, select category, num from temp;) However, the result is that num gets all its decimal places chopped off, so data looks like this instead in R:category numabc 54469517def 36428860 I've tried various alternative approaches, but none have fixed the problem. When I cast the variable to a numeric type like this (data -sqlQuery(channel, select category, num::numeric from temp;), it still gave me the same result. Casting to a real type like this (data -sqlQuery(channel, select category, num::real from temp;) resulted in scientific notation that also rounded the numbers. Any suggestions? Much appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a regression surface demo?
There is also wireframe() in lattice and bplot in rms. -Ista On Mon, Oct 11, 2010 at 3:49 PM, G. Jay Kerns gke...@ysu.edu wrote: Dear Josh, On Mon, Oct 11, 2010 at 3:15 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi All, Does anyone know of a function to plot a regression surface for two predictors? RSiteSearch()s and findFn()s have not turned up what I was looking for. I was thinking something along the lines of: http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif I like the rgl package because showing it from different angles is nice for demonstrations. I started to write my own, but it has some issues (non functioning code start below), and I figured before I tried to work out the kinks, I would ask for the list's feedback. Any comments or suggestions (about functions or preferred idioms for what I tried below, or...) are greatly appreciated. Josh [snip] I haven't tried to debug your code, but wanted to mention that the Rcmdr:::scatter3d function does 3-d scatterplots (with the rgl package) and adds a regression surface, one of 4 or 5 different types. If nothing else, it might be a good place to start for making your own. A person can play around with the different types in the Rcmdr under the Graphs menu. Or, from the command line: library(Rcmdr) with(rock, scatter3d(area, peri, shape)) I hope that this helps, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dot plot by group
On Oct 11, 2010, at 3:55 PM, casperyc wrote: And now I just wonder why the ' bty='n' ' won't work? Left open is the answer to the question ... work ... how? Because dotplot is a lattice function? ... and bty is a base graphic parameter? You could try to give par.settings a list that consisted of bty='n'. (Failed, and since it fails you should look at the axis parameters of the lattice.options section in the help page.) You probably need to print out the lattice settings and then work toward reconfiguring them for you plot with trellis.par.set: print(trellis.par.get()) This does work (and suggests I misread the intent of Sarkar in his help page, but it remains unclear how you _wanted_ it to work. dotplot(variety ~ yield | year * site, data=barley, trellis.par.set(bty='n' ) ) I did dotplot(BATCH~RESPONSE,data=d,subset=Type=='SHORT',bty='n') and tried other bty parameters, none is working Casper -- View this message in context: http://r.789695.n4.nabble.com/dot-plot-by-group-tp2990469p2990500.html Sent from the R help mailing list archive at Nabble.com. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] topicmodels error
I try to fit a LDA model to a TermDocumentMatrix with the topicmodels package... but R says: Error in LDA(TDM, k = k, method = Gibbs, control = list(seed = SEED, : x is of class “TermDocumentMatrix”“simple_triplet_matrix” class(TDM) [1] TermDocumentMatrixsimple_triplet_matrix I try to use a matrix... but don't work: MAT - as.matrix(TDM) Error in LDA(MAT, k = k, method = Gibbs, control = list(seed = SEED, : x is of class “matrix” The help say is correct to use a DocumentTermMatrix: Arguments x Object of class DocumentTermMatrix Can anyone help me? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] topicmodels error
I don't know the answer, but let me point out that: DocumentTermMatrix %in% class(TDM) should return FALSE since: DocumentTermMatrix != TermDocumentMatrix -- David. On Oct 11, 2010, at 4:45 PM, Dario Solari wrote: I try to fit a LDA model to a TermDocumentMatrix with the topicmodels package... but R says: Error in LDA(TDM, k = k, method = Gibbs, control = list(seed = SEED, : x is of class “TermDocumentMatrix”“simple_triplet_matrix” class(TDM) [1] TermDocumentMatrixsimple_triplet_matrix I try to use a matrix... but don't work: MAT - as.matrix(TDM) Error in LDA(MAT, k = k, method = Gibbs, control = list(seed = SEED, : x is of class “matrix” The help say is correct to use a DocumentTermMatrix: Arguments xObject of class DocumentTermMatrix Can anyone help me? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] importing numeric types via sqlQuery
Thanks for the quick reply! Hmm, I did not know about the options default. However, after I set options, it seems like it's still not displaying correctly. I've tried an even simpler example table with only 6 digits (much fewer than 20): categorynum\nabc123.456\ndef456.789\n Then in R:options(digits = 20)data-sqlQuery(channel, select category, num from temp;)But data looks like this: categorynum\nabc123\ndef456\n I suspect it's something with sqlQuery that chops off the digits and wondering if there's a way of turning it off. Thanks! Date: Mon, 11 Oct 2010 16:28:25 -0400 Subject: Re: [R] importing numeric types via sqlQuery From: jholt...@gmail.com To: mmmraspberr...@hotmail.com CC: r-help@r-project.org I would assume that the digitis are not being chopped off. It is just that R will typically print data to 7 significant digits: x - 54469517.307692307692 x [1] 54469517 options(digits=20) x [1] 54469517.3076923 Your data it there and you can set 'options' to show it if you want to. Also with floating point, you will only get about 15 digits of accuracy (see FAQ 7.31). On Mon, Oct 11, 2010 at 4:19 PM, E C mmmraspberr...@hotmail.com wrote: Hi everyone, I am using the sqlQuery function (in RODBC library) to import data from a database into R. My table (called temp) in the database looks like this: categorynumabc 54469517.307692307692def 36428860.230769230769 I used the following R code to pull data into R:data -sqlQuery(channel, select category, num from temp;) However, the result is that num gets all its decimal places chopped off, so data looks like this instead in R:category numabc 54469517def 36428860 I've tried various alternative approaches, but none have fixed the problem. When I cast the variable to a numeric type like this (data -sqlQuery(channel, select category, num::numeric from temp;), it still gave me the same result. Casting to a real type like this (data -sqlQuery(channel, select category, num::real from temp;) resulted in scientific notation that also rounded the numbers. Any suggestions? Much appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] topicmodels error
Excuse me... when i re-read my e-mail i saw my mistake! I use a TermDocumentMatrix instead of a DocumentTermMatrix... On 11 Ott, 22:45, Dario Solari dario.sol...@gmail.com wrote: I try to fit a LDA model to a TermDocumentMatrix with the topicmodels package... but R says: Error in LDA(TDM, k = k, method = Gibbs, control = list(seed = SEED, : x is of class “TermDocumentMatrix”“simple_triplet_matrix” class(TDM) [1] TermDocumentMatrix simple_triplet_matrix I try to use a matrix... but don't work: MAT - as.matrix(TDM) Error in LDA(MAT, k = k, method = Gibbs, control = list(seed = SEED, : x is of class “matrix” The help say is correct to use a DocumentTermMatrix: Arguments x Object of class DocumentTermMatrix Can anyone help me? Thanks __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a regression surface demo?
Thanks for everyone's responses. Just to follow up, here is a working version of my original. The code is not pretty, but it functions. Assuming you have the 'rgl' package installed and have sourced this function, here are some examples: RegSurfaceDemo(mpg ~ vs + wt, data = mtcars) RegSurfaceDemo(mpg ~ vs * wt, data = mtcars) RegSurfaceDemo(qsec ~ disp * hp, data = mtcars) It cannot handle factors, and the axes labels are hideous...I'll get to that eventually. Thanks for all your help and suggestions. Josh RegSurfaceDemo - function(formula, data, xlim = NULL, ylim = NULL, zlim = NULL, resolution = 10) { require(rgl) ## This cannot be the proper way to extract variable names from formula vars - rownames(attr(terms(formula), factors)) ## if no limits set, make them nearest integer to ## .75 the lowest value and 1.25 the highest ranger - function(x) { as.integer(range(x) * c(.75, 1.25)) } if(is.null(xlim)) {xlim - ranger(data[, vars[2]])} if(is.null(ylim)) {ylim - ranger(data[, vars[3]])} if(is.null(zlim)) {zlim - ranger(data[, vars[1]])} ## This does not actually work because the data frame ## does not get named properly (actually it throws an error) f - function (x, y) { newdat - data.frame(x, y) colnames(newdat) - c(vars[2], vars[3]) predict(my.model, newdata = newdat) } ## Fit model my.model - lm(formula = formula, data = data) ## Create X, Y, and Z grids X - seq(from = xlim[1], to = xlim[2], length.out = resolution) Y - seq(from = ylim[1], to = ylim[2], length.out = resolution) Z - outer(X, Y, f) ## Create 3d scatter plot and add the regression surface open3d() with(data = data, plot3d(x = get(vars[2]), y = get(vars[3]), z = get(vars[1]), xlim = xlim, ylim = ylim, zlim = zlim)) par3d(ignoreExtent = TRUE) surface3d(X, Y, Z, col = blue, alpha = .6) par3d(ignoreExtent = FALSE) return(summary(my.model)) } On Mon, Oct 11, 2010 at 1:28 PM, Ista Zahn iz...@psych.rochester.edu wrote: There is also wireframe() in lattice and bplot in rms. -Ista On Mon, Oct 11, 2010 at 3:49 PM, G. Jay Kerns gke...@ysu.edu wrote: Dear Josh, On Mon, Oct 11, 2010 at 3:15 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi All, Does anyone know of a function to plot a regression surface for two predictors? RSiteSearch()s and findFn()s have not turned up what I was looking for. I was thinking something along the lines of: http://mallit.fr.umn.edu/fr5218/reg_refresh/images/fig9.gif I like the rgl package because showing it from different angles is nice for demonstrations. I started to write my own, but it has some issues (non functioning code start below), and I figured before I tried to work out the kinks, I would ask for the list's feedback. Any comments or suggestions (about functions or preferred idioms for what I tried below, or...) are greatly appreciated. Josh [snip] I haven't tried to debug your code, but wanted to mention that the Rcmdr:::scatter3d function does 3-d scatterplots (with the rgl package) and adds a regression surface, one of 4 or 5 different types. If nothing else, it might be a good place to start for making your own. A person can play around with the different types in the Rcmdr under the Graphs menu. Or, from the command line: library(Rcmdr) with(rock, scatter3d(area, peri, shape)) I hope that this helps, Jay __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] support vector machine for right censored data
Hi, Does anybody know how to fit a support vector machine regression with right censored time-to-event response to select the best subset among several predictor variables? Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Slow reading multiple tick data files into list of dataframes
Hi, I am trying to find the best way to read 85 tick data files of format: head(nbbo) 1 bid CON 09:30:00.72209:30:00.722 32.71 98 2 ask CON 09:30:00.78209:30:00.810 33.14 300 3 ask CON 09:30:00.80909:30:00.810 33.14 414 4 bid CON 09:30:00.78309:30:00.810 33.06 200 Each file has between 100,000 to 300,300 rows. Currently doing nbbo.list- lapply(filePath, read.csv)to create list with 85 data.frame objects...but it is taking minutes to read in the data and afterwards I get the following message on the console when taking further actions (though it does then stop): The R Engine is busy. Please wait, and try your command again later. filePath in the above example is a vector of filenames: head(filePath) [1] C:/work/A/A_2010-10-07_nbbo.csv [2] C:/work/AAPL/AAPL_2010-10-07_nbbo.csv [3] C:/work/ADBE/ADBE_2010-10-07_nbbo.csv [4] C:/work/ADI/ADI_2010-10-07_nbbo.csv Is there a better/quicker or more R way of doing this ? Thanks, Chris -- View this message in context: http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Slow reading multiple tick data files into list of dataframes
On Mon, Oct 11, 2010 at 5:39 PM, rivercode aqua...@gmail.com wrote: Hi, I am trying to find the best way to read 85 tick data files of format: head(nbbo) 1 bid CON 09:30:00.722 09:30:00.722 32.71 98 2 ask CON 09:30:00.782 09:30:00.810 33.14 300 3 ask CON 09:30:00.809 09:30:00.810 33.14 414 4 bid CON 09:30:00.783 09:30:00.810 33.06 200 Each file has between 100,000 to 300,300 rows. Currently doing nbbo.list- lapply(filePath, read.csv) to create list with 85 data.frame objects...but it is taking minutes to read in the data and afterwards I get the following message on the console when taking further actions (though it does then stop): The R Engine is busy. Please wait, and try your command again later. filePath in the above example is a vector of filenames: head(filePath) [1] C:/work/A/A_2010-10-07_nbbo.csv [2] C:/work/AAPL/AAPL_2010-10-07_nbbo.csv [3] C:/work/ADBE/ADBE_2010-10-07_nbbo.csv [4] C:/work/ADI/ADI_2010-10-07_nbbo.csv Is there a better/quicker or more R way of doing this ? You could try (possibly with suitable additonal arguments): library(sqldf) lapply(filePath, read.csv.sql) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Slow reading multiple tick data files into list of dataframes
Date: Mon, 11 Oct 2010 14:39:54 -0700 From: aqua...@gmail.com To: r-help@r-project.org Subject: [R] Slow reading multiple tick data files into list of dataframes [...] Is there a better/quicker or more R way of doing this ? While there may be an obvious R-related answer, usually it helps if you can determine where the bottleneck is in terms of resources on your platform- often on older machines you run out of real memory and then all the time is spent reading the file onto VM back on disk. Can you tell if you are CPU or memory limited by using task manager? It could in fact be that the best solution involves not trying to hold your entire data set in memory at once, hard to know without knowing your platform etc. In the past, I've found that actually sorting data, a slow process itself, can speed things up a lot due to less thrashing of memory hierarchy during the later analysis. I doubt if that helps your immediate problem but it does point to one possible non-obvious optimization depending on what is slowing you down. Thanks, Chris -- View this message in context: http://r.789695.n4.nabble.com/Slow-reading-multiple-tick-data-files-into-list-of-dataframes-tp2990723p2990723.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compare histograms
Hello How to compare two statistical histograms? How i can know if these histograms are equivalent or not?? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble accessing cov function from stats library
Note that R is case sensitive, so cov and Cov are different. From: Barth B. Riley bbri...@chestnut.org To:r-help@r-project.org r-help@r-project.org Date: 12/Oct/2010 3:31a Subject: [R] Trouble accessing cov function from stats library Dear all I am trying to use the cov function in the stats library. I have no problem using this function from the console. However, in my R script I received a function not found message. Then I called stats::cov(...) and received an error message that the function was not exported. Then I tried stats:::cov (three colons) and received the error Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object 'Cov' not found I am also importing the ltm library, though I'm not aware of a cov function in ltm that could be causing a conflict. Any suggestions? Thanks Barth PRIVILEGED AND CONFIDENTIAL INFORMATION This transmittal and any attachments may contain PRIVILEGED AND CONFIDENTIAL information and is intended only for the use of the addressee. If you are not the designated recipient, or an employee or agent authorized to deliver such transmittals to the designated recipient, you are hereby notified that any dissemination, copying or publication of this transmittal is strictly prohibited. If you have received this transmittal in error, please notify us immediately by replying to the sender and delete this copy from your system. You may also call us at (309) 827-6026 for assistance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R ( http://www.r/ )-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MATLAB vrs. R
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Craig O'Connell Sent: Monday, October 11, 2010 8:10 AM To: alain.guil...@uclouvain.be Cc: r-help@r-project.org; pda...@gmail.com Subject: Re: [R] MATLAB vrs. R alain, Perhaps i'm still entering the code wrong. I tried using your result=myquadrature(f,0,2000) print(result) Instead of my: val = myquadrature(f,a,b) result=myquadrature(val,0,2000) print(result) ...and I am still getting an inf inf inf inf inf... Did you change any of the previous syntax in addition to changing the result statement? Thank you so much and I think my brain is fried! Happy Holiday. Craig Craig, I haven't seen an answer to this yet, so let me jump in. You seem to have some stuff still leftover from MATLAB. Here is some cleaned up code that produces the result you expect. I don't think the value of dx was being correctly computed in your code. I did not change the assignment operator you used (=), but in R the preferred operator is - (without the quotes). myquadrature - function(f,a,b){ npts = length(f) nint = npts-1 if(npts = 1) error('need at least two points to integrate') if(b = a) error('something wrong with the interval, b should be greater than a') else dx=b/nint sum(f[-npts]+f[-1])/2*dx } #Call my quadrature x = seq(0,2000,10) h = 10*(cos(((2*pi)/2000)*(x-mean(x)))+1) u = (cos(((2*pi)/2000)*(x-mean(x)))+1) a = x[1] b = x[length(x)] plot(x,-h) a = x[1]; b = x[length(x)] #call your quadrature function. Hint, the answer should be 3. f = u*h result = myquadrature(f,a,b) result Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Revolutions Blog: September Roundup
I write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. In case you missed them, here are some articles related to R from the month of September: http://bit.ly/cuFNat presented a profile of Hadley Wickham, author of many popular R packages including ggplot2 and reshape. http://bit.ly/bS71Ld riffed the design of the new Twitter website into a discussion on calculating the Golden Mean with R. Several readers contributed 1-liners based on the Fibonacci sequence: http://bit.ly/dvpemK . http://bit.ly/bunYJE linked to some elegant code for calculating the Mandelbrot set in R, and a beautiful animation of the results. http://bit.ly/ahIZzo linked to a blog post by JD Long on simulating multivariate random variables using copulas. http://bit.ly/a8mjZm announced a ggplot2 data visualization competition. http://bit.ly/cRKEZs linked to a discussion about the merits of dot charts versus bar charts. http://bit.ly/cavSLB announced the availability of Revolution R Enterprise 4.0, available free to academics. http://bit.ly/aBuFEt posted updated statistics on the growth in R packages, and asked what other languages can learn from R's package system. http://bit.ly/cYujCF noted updates to the plyr and reshape packages, featuring improved performance and parallel processing. http://bit.ly/afhkSt noted that R 2.12 is scheduled for release on October 15. http://bit.ly/bw6ylo announced RevoDeployR, Web Services integration for R included in Revolution R Enterprise. You can download slides and a replay of the webinar introducing RevoDeployR here: http://bit.ly/aRUrPh . http://bit.ly/afwmwf linked to a feature article about R in Tech Target: R's time is now. http://bit.ly/bB2MVC reviewed the state of running R on the iPhone and iPad. http://bit.ly/bBRHB2 noted that RHIPE creator Saptarshi Guha is presenting at the Hadoop World conference, and linked to an interview with him. (There's also a new profile of Saptarshi at: http://bit.ly/9k7ABg .) http://bit.ly/aPcxBP linked to a collection of guidelines for efficient R programming by Martin Morgan. http://bit.ly/aez046 relayed the Call for Papers for the R/Finance 2011 conference in Chicago. http://bit.ly/bz8eX8 had guest blogger Joseph Rickert's thoughts on the relationship between Map-Reduce/Hadoop and R. http://bit.ly/bYQrt4 linked to some hints for the R beginner by Patrick Burns. http://bit.ly/cU9BzF linked to Dirk Eddelbuettel's review of the contributions to R resulting from this year's Google Summer of Code. There are new R user groups in New Jersey (http://bit.ly/9JnRcg), Brisbane, QLD (http://bit.ly/cVXHdp) and Toronto (http://bit.ly/bWhJyw). Other non-R-related stories in the past month included one about mono-monostatic bodies (http://bit.ly/cr79bo), and (on a lighter note), how statisticians and scientists (fail to) communicate (http://bit.ly/aYgEEa), and funny airline safety videos (http://bit.ly/bol1ZO). The R Community Calendar has also been updated at: http://blog.revolutionanalytics.com/calendar.html If you're looking for more articles about R, you can find summaries from previous months at http://blog.revolutionanalytics.com/roundups/. Join the Revolution mailing list at http://revolutionanalytics.com/newsletter to be alerted to new articles on a monthly basis. As always, thanks for the comments and please keep sending suggestions to me at da...@revolutionanalytics.com . Don't forget you can also follow the blog using an RSS reader like Google Reader, or by following me on Twitter (I'm @revodavid). Cheers, # David -- David M Smith da...@revolutionanalytics.com VP of Marketing, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MATLAB vrs. R
I apologize for the noise. I didn't clean up the code enough. See below. snip Craig, I haven't seen an answer to this yet, so let me jump in. You seem to have some stuff still leftover from MATLAB. Here is some cleaned up code that produces the result you expect. I don't think the value of dx was being correctly computed in your code. I did not change the assignment operator you used (=), but in R the preferred operator is - (without the quotes). myquadrature - function(f,a,b){ npts = length(f) nint = npts-1 if(npts = 1) error('need at least two points to integrate') if(b = a) error('something wrong with the interval, b should be greater than a') else dx=b/nint The 2 'if' statements above should have been if(npts = 1) stop('need at least two points to integrate') if(b = a) stop('something wrong with the interval, b should be greater than a') else dx=b/nint sum(f[-npts]+f[-1])/2*dx } #Call my quadrature x = seq(0,2000,10) h = 10*(cos(((2*pi)/2000)*(x-mean(x)))+1) u = (cos(((2*pi)/2000)*(x-mean(x)))+1) a = x[1] b = x[length(x)] plot(x,-h) a = x[1]; b = x[length(x)] #call your quadrature function. Hint, the answer should be 3. f = u*h result = myquadrature(f,a,b) result Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running R script on linux server
Hi R-users, I have a problem running my R code on a Linux cluster. What I did was write a .pbs file to instruct the cluster on what to do and how: #!/bin/sh #PBS -m ae #PBS -M uqlca...@uq.edu.au #PBS -A uq-CSER #PBS -N job1_lollo #PBS -l select=1:ncpus=1:NodeType=fast:mem=8GB #PBS -l walltime=999:00:00 cd $PBS_O_WORKDIR source /usr/share/modules/init/bash module load R/2.11.1 /home/uqlcatta/script/diag.sh The .pbs file calls a .sh file, which is located on my home directory on the cluster, and which contains the R script (enclosed in ) to run #!/bin/bash echo mat - matrix(1:12,nrow=3,ncol=4) diagonal - diag(mat) write.csv(diagonal, file = diagonal.csv) R_tmp echo 'source(R_tmp)' | R --vanilla --slave rm R_tmp However the cluster sends back to me an error message saying: Error in write.table(diagonal, file = diagonal.csv, col.names = NA, sep = ,, : object 'diagonal.csv' not found Calls: source ... write.csv - eval.parent - eval - eval - write.table Execution halted The write.csv command worked on the R consol on my computer, so I don't know what is the problem here. Thanks in advance for your help Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear List, I am trying to plot date vs. time, but am having problems getting my y-axis labels how I want them. When left on its own R plots time at 6 hour intervals from 03:00 to 23:00. I am wanting 6 hour intervals from 2:00 to 22:00. I realize yaxp doesn't work in plot(), so I am trying to get it to work in par(). However, now I get the ticks where I want them but the time is output as a very big number (serial time?). I have also tried axis() using at= and also get seriel time numbers. Any suggestions on how to format time on an axis? mydat$Date-as.POSIXct(as.character(mydat$Date), format=c(%m/%d/%Y)) mydat$Time-as.POSIXct(as.character(mydat$Time), format=c(%H:%M:%S)) plot(mydat$DateTime,mydat$Time, xlab=c(Date), ylab=c(Time), xlim=c(min(mydat$DateTime),max(mydat$DateTime)), ylim=c(min(mydat$Time),max(mydat$Time)), yaxt=n, yaxs=i, pch=19, cex=.4, type=n) par(yaxp=c(as.POSIXct(as.character(02:00), format=c(%H:%M)),as.POSIXct(as.character(22:00), format=c(%H:%M)),5)) axis(2) Thanks, Tim Tim Clark Marine Ecologist National Park of American Samoa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] expression() problem !
Hello everyone ... I have a problem when I try to mix expressions using the function expression () with variables coming from my code. Has anyone faced such a problem? -- View this message in context: http://r.789695.n4.nabble.com/expression-problem-tp2990891p2990891.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expression() problem !
On Oct 11, 2010, at 7:27 PM, michel.mas wrote: Hello everyone ... I have a problem when I try to mix expressions using the function expression () with variables coming from my code. Has anyone faced such a problem? Many times: ?bquote # instead of expression -- View this message in context: http://r.789695.n4.nabble.com/expression-problem-tp2990891p2990891.html Sent from the R help mailing list archive at Nabble.com. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Two things I see. First is that par needs to be called _before_ the plot (although its effects will persist if you need to keep hacking away) and the second is that yaxp is expecting numeric arguments (which you are offering) but in your case these will need to be the numeric values in a Date-aligned version of those DateTimes used in your mydat arguments. POSIXct is the number of seconds since the origin. -- David. On Oct 11, 2010, at 6:56 PM, Tim Clark wrote: Dear List, I am trying to plot date vs. time, but am having problems getting my y-axis labels how I want them. When left on its own R plots time at 6 hour intervals from 03:00 to 23:00. I am wanting 6 hour intervals from 2:00 to 22:00. I realize yaxp doesn't work in plot(), so I am trying to get it to work in par(). However, now I get the ticks where I want them but the time is output as a very big number (serial time?). I have also tried axis() using at= and also get seriel time numbers. Any suggestions on how to format time on an axis? mydat$Date-as.POSIXct(as.character(mydat$Date), format=c(%m/%d/ %Y)) mydat$Time-as.POSIXct(as.character(mydat$Time), format=c(%H:%M: %S)) plot(mydat$DateTime,mydat$Time, xlab=c(Date), ylab=c(Time), xlim=c(min(mydat$DateTime),max(mydat$DateTime)), ylim=c(min(mydat$Time),max(mydat$Time)), yaxt=n, yaxs=i, pch=19, cex=.4, type=n) par(yaxp=c(as.POSIXct(as.character(02:00), format=c(%H:%M)),as.POSIXct(as.character(22:00), format=c(%H: %M)),5)) axis(2) Thanks, Tim Tim Clark Marine Ecologist National Park of American Samoa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
use 'axis' with 'at=' and 'labels=' to put your own labels on the axis. Have to guess at your data since you did not provide a reproducible example: x - seq(as.POSIXct('2010-10-11 00:00'), as.POSIXct('2010-10-12 00:00'), length = 20) plot(x, x, type = 'o', yaxt = 'n') axis.POSIXct(2, at = seq(as.POSIXct('2010-10-11 02:00'), as.POSIXct('2010-10-12 00:00'), by = '6 hours')) On Mon, Oct 11, 2010 at 6:56 PM, Tim Clark mudiver1...@yahoo.com wrote: Dear List, I am trying to plot date vs. time, but am having problems getting my y-axis labels how I want them. When left on its own R plots time at 6 hour intervals from 03:00 to 23:00. I am wanting 6 hour intervals from 2:00 to 22:00. I realize yaxp doesn't work in plot(), so I am trying to get it to work in par(). However, now I get the ticks where I want them but the time is output as a very big number (serial time?). I have also tried axis() using at= and also get seriel time numbers. Any suggestions on how to format time on an axis? mydat$Date-as.POSIXct(as.character(mydat$Date), format=c(%m/%d/%Y)) mydat$Time-as.POSIXct(as.character(mydat$Time), format=c(%H:%M:%S)) plot(mydat$DateTime,mydat$Time, xlab=c(Date), ylab=c(Time), xlim=c(min(mydat$DateTime),max(mydat$DateTime)), ylim=c(min(mydat$Time),max(mydat$Time)), yaxt=n, yaxs=i, pch=19, cex=.4, type=n) par(yaxp=c(as.POSIXct(as.character(02:00), format=c(%H:%M)),as.POSIXct(as.character(22:00), format=c(%H:%M)),5)) axis(2) Thanks, Tim Tim Clark Marine Ecologist National Park of American Samoa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Slow reading multiple tick data files into list of dataframes
For 100,000 rows, it took about 2 seconds to read it in on my system: system.time(x - read.table('/recv/test.txt', as.is=TRUE)) user system elapsed 1.920.082.08 str(x) 'data.frame': 196588 obs. of 7 variables: $ V1: int 1 2 3 4 1 2 3 1 2 3 ... $ V2: chr bid ask ask bid ... $ V3: chr CON CON CON CON ... $ V4: chr 09:30:00.722 09:30:00.782 09:30:00.809 09:30:00.783 ... $ V5: chr 09:30:00.722 09:30:00.810 09:30:00.810 09:30:00.810 ... $ V6: num 32.7 33.1 33.1 33.1 32.7 ... $ V7: int 98 300 414 200 98 300 414 98 300 414 ... object.size(x) 6291928 bytes Given that you have about 85 files, I would guess that you would need about 800MB if all were 300K lines longs. You might be getting memory fragmentation. You might try using gc() every so often in the loop. What are you going to do with the data? Are you going to make one big file? In this case you might want a 64 bit version since you will have a single instance of 800K and will probably need 2-3X that much memory if copies are being made during processing. Object might be larger in 64-bit. Maybe you need to follow Gabor's advice and read it into a database and then process it from there. On Mon, Oct 11, 2010 at 5:48 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Oct 11, 2010 at 5:39 PM, rivercode aqua...@gmail.com wrote: Hi, I am trying to find the best way to read 85 tick data files of format: head(nbbo) 1 bid CON 09:30:00.722 09:30:00.722 32.71 98 2 ask CON 09:30:00.782 09:30:00.810 33.14 300 3 ask CON 09:30:00.809 09:30:00.810 33.14 414 4 bid CON 09:30:00.783 09:30:00.810 33.06 200 Each file has between 100,000 to 300,300 rows. Currently doing nbbo.list- lapply(filePath, read.csv) to create list with 85 data.frame objects...but it is taking minutes to read in the data and afterwards I get the following message on the console when taking further actions (though it does then stop): The R Engine is busy. Please wait, and try your command again later. filePath in the above example is a vector of filenames: head(filePath) [1] C:/work/A/A_2010-10-07_nbbo.csv [2] C:/work/AAPL/AAPL_2010-10-07_nbbo.csv [3] C:/work/ADBE/ADBE_2010-10-07_nbbo.csv [4] C:/work/ADI/ADI_2010-10-07_nbbo.csv Is there a better/quicker or more R way of doing this ? You could try (possibly with suitable additonal arguments): library(sqldf) lapply(filePath, read.csv.sql) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function using values separated by a comma
Hi Just used this function on my real data - several enormous files (80 rows by 200 columns...) and it worked perfectly! Thanks again for your help, saved me a lot of time! A last quick query, I have several other similar problems to deal with in my data - do you know a useful book or online course that would be helpful for learning these sorts of data handling functions? Thanks again! --- On Fri, 8/10/10, Jeffrey Spies-2 [via R] ml-node+2968583-620301009-75...@n4.nabble.com wrote: From: Jeffrey Spies-2 [via R] ml-node+2968583-620301009-75...@n4.nabble.com Subject: Re: function using values separated by a comma To: burgundy saub...@yahoo.com Date: Friday, 8 October, 2010, 16:48 Here's another method without using any external regular expression libraries: dat - read.table(tc - textConnection( '0,1 1,3 40,10 0,0 20,5 4,2 10,40 10,0 0,11 1,2 120,10 0,0'), sep=) mat - apply(dat, c(1,2), function(x){ Â Â Â Â temp - as.numeric(unlist(strsplit(x, ','))) Â Â Â Â min(temp)/sum(temp) }) For mat[2,4], I get 0 (as did the other solutions), and you get 1, so check on that. If you want the divide-by-0 NaNs to be 0, you can check that by replacing min(temp)/sum(temp) with: ifelse(is.nan(val-min(temp)/sum(temp)), 0, val) This has an advantage over: mat[is.na(mat)] - 0 in that you might have true missingness in your data and is.na won't be able to distinguish it. Cheers, Jeff. On Fri, Oct 8, 2010 at 1:19 AM, burgundy [hidden email] wrote: Hello, I have a dataframe (tab separated file) which looks like the example below - two values separated by a comma, and tab separation between each of these. Â Â [,1] Â [,2] Â [,3] Â [ ,4] [1,] 0,1 Â 1,3 Â 40,10 Â 0,0 [2,] 20,5 Â 4,2 Â 10,40 Â 10,0 [3,] 0,11 Â 1,2 Â 120,10 Â 0,0 I would like to calculate the percentage of the smallest number separated by the comma by: 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 = 0.8 3) where the value generated by 2) is 0.5, print 1-value, otherwise, leave value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 plan to generate file like: Â Â [,1] Â [,2] Â [,3] Â [,4] [1,] 1 Â 0.25 Â 0.2 Â 0 [2,] 0.2 Â 0.33 Â 0.2 Â 1 [3,] 1 Â 0.33 Â 0.08 Â 0 Apologies, I know this is very complex. Any help, even just some pointers on how to write a general function where values are separated by a comma, is realy very much appreciated! Thank you -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2967870.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @ http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2968583.html To unsubscribe from function using values separated by a comma, click here. -- View this message in context: http://r.789695.n4.nabble.com/function-using-values-separated-by-a-comma-tp2967870p2990966.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] importing numeric types via sqlQuery
Must be your datrabase interface: require(sqldf) # don't have MySql, but will use sqlite as example myData - data.frame(cat = c('abc', 'def'), num=c(123.456, 7890.1234)) myData cat num 1 abc 123.456 2 def 7890.123 sqldf('select cat, num from myData') # now make sql request cat num 1 abc 123.456 2 def 7890.123 # works fine here On Mon, Oct 11, 2010 at 4:51 PM, E C mmmraspberr...@hotmail.com wrote: Thanks for the quick reply! Hmm, I did not know about the options default. However, after I set options, it seems like it's still not displaying correctly. I've tried an even simpler example table with only 6 digits (much fewer than 20): category num\n abc 123.456\n def 456.789\n Then in R: options(digits = 20) data-sqlQuery(channel, select category, num from temp;) But data looks like this: category num\n abc 123\n def 456\n I suspect it's something with sqlQuery that chops off the digits and wondering if there's a way of turning it off. Thanks! Date: Mon, 11 Oct 2010 16:28:25 -0400 Subject: Re: [R] importing numeric types via sqlQuery From: jholt...@gmail.com To: mmmraspberr...@hotmail.com CC: r-help@r-project.org I would assume that the digitis are not being chopped off. It is just that R will typically print data to 7 significant digits: x - 54469517.307692307692 x [1] 54469517 options(digits=20) x [1] 54469517.3076923 Your data it there and you can set 'options' to show it if you want to. Also with floating point, you will only get about 15 digits of accuracy (see FAQ 7.31). On Mon, Oct 11, 2010 at 4:19 PM, E C mmmraspberr...@hotmail.com wrote: Hi everyone, I am using the sqlQuery function (in RODBC library) to import data from a database into R. My table (called temp) in the database looks like this: category numabc 54469517.307692307692def 36428860.230769230769 I used the following R code to pull data into R:data -sqlQuery(channel, select category, num from temp;) However, the result is that num gets all its decimal places chopped off, so data looks like this instead in R:category numabc 54469517def 36428860 I've tried various alternative approaches, but none have fixed the problem. When I cast the variable to a numeric type like this (data -sqlQuery(channel, select category, num::numeric from temp;), it still gave me the same result. Casting to a real type like this (data -sqlQuery(channel, select category, num::real from temp;) resulted in scientific notation that also rounded the numbers. Any suggestions? Much appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time OffSet From GMT - Losing it
That is embarrassingthanks for pointing out my mistake. Chris -- View this message in context: http://r.789695.n4.nabble.com/Time-OffSet-From-GMT-Losing-it-tp2968940p2990987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MATLAB vrs. R
Daniel, That's it! Thanks. Your help is very much appreciated. I'm hoping to nail down the code conversion from MATLAB to R, but it seems to be a bit more difficult that I had anticipated. Craig From: djnordl...@frontier.com To: djnordl...@frontier.com; r-help@r-project.org Date: Mon, 11 Oct 2010 16:12:48 -0700 Subject: Re: [R] MATLAB vrs. R I apologize for the noise. I didn't clean up the code enough. See below. snip Craig, I haven't seen an answer to this yet, so let me jump in. You seem to have some stuff still leftover from MATLAB. Here is some cleaned up code that produces the result you expect. I don't think the value of dx was being correctly computed in your code. I did not change the assignment operator you used (=), but in R the preferred operator is - (without the quotes). myquadrature - function(f,a,b){ npts = length(f) nint = npts-1 if(npts = 1) error('need at least two points to integrate') if(b = a) error('something wrong with the interval, b should be greater than a') else dx=b/nint The 2 'if' statements above should have been if(npts = 1) stop('need at least two points to integrate') if(b = a) stop('something wrong with the interval, b should be greater than a') else dx=b/nint sum(f[-npts]+f[-1])/2*dx } #Call my quadrature x = seq(0,2000,10) h = 10*(cos(((2*pi)/2000)*(x-mean(x)))+1) u = (cos(((2*pi)/2000)*(x-mean(x)))+1) a = x[1] b = x[length(x)] plot(x,-h) a = x[1]; b = x[length(x)] #call your quadrature function. Hint, the answer should be 3. f = u*h result = myquadrature(f,a,b) result Hope this is helpful, Dan Daniel Nordlund Bothell, WA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create DataSet with MCAR type
Dear all I want to create dataset with MCAR type from my dataset. I have my dataset with 100 records, and I want to create dataset from this dataset to missing 5 records. How I can do it. THX Jumlong -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparison of two files with multiple arguments
Hello, I have an example file which can be generated using: dat - read.table(tc - textConnection( 'T T,G G T C NA G G A,T A A NA'), sep=) I also have a reference file with the same number of rows, for example: G C A I would like to transform the file to numerical values using the following arguments: 1) Where data points have two letters separated by a comma, e.g. T,G, replace with a 2 2) Where single letter data points match the data point in the corresponding row of the reference file, replace with a 0 3) Where single letter data points do not match the reference file, replace with a 1 4) NA is left as NA In the example, the output file would look like: 1 2 0 1 0 NA 1 1 2 0 0 NA Any advice very much appreciated. Also, if you know of any good books or online courses that can help me to learn how to deal with these sorts of data handling queries, that is also great! Thank you -- View this message in context: http://r.789695.n4.nabble.com/Comparison-of-two-files-with-multiple-arguments-tp2991043p2991043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create DataSet with MCAR type
Hello Jumlong, I'm not sure whether by '100 records' you mean a vector of 100 values or a matrix / data.frame of 100 rows. For a vector or matrix X you can do this: X[ sample( length(X), 5 ) ] - NA For a data.frame X you could do this: X[ sample( nrow(X), 5 ), sample( ncol(X), 5) ] - NA Hope this helps, Michael On 12 October 2010 13:14, Jumlong Vongprasert jumlong.u...@gmail.com wrote: Dear all I want to create dataset with MCAR type from my dataset. I have my dataset with 100 records, and I want to create dataset from this dataset to missing 5 records. How I can do it. THX Jumlong -- Jumlong Vongprasert Institute of Research and Development Ubon Ratchathani Rajabhat University Ubon Ratchathani THAILAND 34000 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparison of two files with multiple arguments
Hello, Here's one way to do it. It assumes dat has character values, not factors. dat2 - matrix(0, nrow(dat), ncol(dat)) dat2[ is.na(dat) ] - NA dat2[ apply(dat, 2, function(x) grepl(,, x)) ] - 2 dat2[ apply(dat, 2, function(x) x != ref) ] - 1 Michael On 12 October 2010 13:24, burgundy saub...@yahoo.com wrote: Hello, I have an example file which can be generated using: dat - read.table(tc - textConnection( 'T T,G G T C NA G G A,T A A NA'), sep=) I also have a reference file with the same number of rows, for example: G C A I would like to transform the file to numerical values using the following arguments: 1) Where data points have two letters separated by a comma, e.g. T,G, replace with a 2 2) Where single letter data points match the data point in the corresponding row of the reference file, replace with a 0 3) Where single letter data points do not match the reference file, replace with a 1 4) NA is left as NA In the example, the output file would look like: 1 2 0 1 0 NA 1 1 2 0 0 NA Any advice very much appreciated. Also, if you know of any good books or online courses that can help me to learn how to deal with these sorts of data handling queries, that is also great! Thank you -- View this message in context: http://r.789695.n4.nabble.com/Comparison-of-two-files-with-multiple-arguments-tp2991043p2991043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with function writing
Hello all I have what seems like a simple question but have not been able to find an answer on the forum. I'm trying to define a function which involves regression models and a large number of covariates. I would like the function to accept any number of covariates and, ideally, I would like to be able to enter the covariates in a group (e.g. as a list) rather than individually. Is there any way of doing this? Example: #define function involving regression model with several covariates custom - function(outcome, exposure, covar1, covar2, covar3){ model - lm(outcome ~ exposure + covar1 + covar2 + covar3) expected - predict(model) summary(expected) } library(MASS) attach(birthwt) custom(bwt, lwt, low, age, race) #Works when 3 covariates are specified custom(bwt,lwt,low,age) # Does not work with or 3 covariates varlist - list(low,age,race) custom(bwt,lwt, varlist) #Does not work if covariates are included as a list Thanks very much for your help Tim -- Tim Elwell-Sutton University of Hong Kong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running R on a server
Sachin, I apologize if I'm over-simplifying your question. I mostly run R on an Ubuntu server via a Windows laptop. I log in to the remote server via SSH (via PuTTY on Windows), and then open an interactive R session through the usual ways (typing 'R' at the Linux command line). When creating figures, I'll usually just output the figures to pdfs, via pdf(). However, if I need a more interactive experience with the figures, I'll ask PuTTY to initiate 'X11 forwarding', which, on Windows, also requires an X server, such as Xming. This causes plots to appear in new windows just like you were running R on your local machine. If you are interested in running non-interactive batch R scripts, reference the following: http://stat.ethz.ch/R-manual/R-devel/library/utils/html/BATCH.html . Another note on running R over SSH: if you lose your SSH connection, the R process will stop. I get around this by using the 'screen' command in Linux (http://en.wikipedia.org/wiki/GNU_Screen). See man screen for details. While 'screen' does many things, relevant to this thread it creates new remote terminals that persist after SSH disconnects. After SSHing into my server, I type 'screen' and [return], then 'R'. R starts up and I start the analysis. I can manually 'detach' the screen by hitting the 'control' and 'a' keys together, and then hitting the 'd' key. The R process (and any other processes started in that screen session) will continue to run. One can start many screens. Typing 'screen -ls' shows the currently running screens. If only one screen is running, typing 'screen -r' will attach that screen, and one can continue on one's analysis in R. If multiple screen sessions are open, one will need to specify the screen name after the 'screen -r' command. Sometimes after an abrupt disconnect, the screen will remain attached, even though the SSH connection is lost. To get back to the screen session, one must first 'detach' and then 're-attach' the screen by typing 'screen -dr'. Let me know if you have more specific questions. Cheers, Jeremy Jeremy Hetzel Boston University -- View this message in context: http://r.789695.n4.nabble.com/Running-R-on-a-server-tp2967748p2991084.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting data subset for plot
Dear list, I want to make a plot based on the following information, using the command plot. variable A for x axis : temperature (range: -20 degrees to 40 degree) variable B for y axis : altitude (range: 50 m to 2500 m ) The data below 0 degree of X variable wants to be erased tentatively. Please kindly advise the command to extract the data ranging from 0 degree to 40 degrees. Thank you. Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with function writing
Hello Tim, This function will do it where the covariates are provided as separate arguments. It would be easy to modify this to handle a list too. function(outcome, ...) { arg.names - as.character(match.call())[-1] nargs - length(arg.names) f - as.formula(paste(arg.names[1], ~, paste(arg.names[2:nargs], collapse=+))) model - lm(f) # rest of your code here } Hope that helps you get started. Michael On 12 October 2010 14:35, Tim Elwell-Sutton tesut...@hku.hk wrote: Hello all I have what seems like a simple question but have not been able to find an answer on the forum. I'm trying to define a function which involves regression models and a large number of covariates. I would like the function to accept any number of covariates and, ideally, I would like to be able to enter the covariates in a group (e.g. as a list) rather than individually. Is there any way of doing this? Example: #define function involving regression model with several covariates custom - function(outcome, exposure, covar1, covar2, covar3){ model - lm(outcome ~ exposure + covar1 + covar2 + covar3) expected - predict(model) summary(expected) } library(MASS) attach(birthwt) custom(bwt, lwt, low, age, race) #Works when 3 covariates are specified custom(bwt,lwt,low,age) # Does not work with or 3 covariates varlist - list(low,age,race) custom(bwt,lwt, varlist) #Does not work if covariates are included as a list Thanks very much for your help Tim -- Tim Elwell-Sutton University of Hong Kong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.