Re: [R] Translating lm.object to SQL, C, etc function
On Fri, 14 Feb 2003 08:31:58 +0100, Uwe Ligges [EMAIL PROTECTED] said: [EMAIL PROTECTED] wrote: ... So my question is, how do I export an lm.object in some form that I can then apply to prediction in C, SQL, or some other language? All I'm looking for is some well-structured textual or data frame output that I can then manipulate with appropriate tools, whether it be S itself, or something like Perl. ... See ?dump Thanks for the suggestion. After my last post I tried switching from SPLUS to R and discovered the useful xlevels attribute, which when output with expression(), combined with the coefficients attribute, gives me the information I need. dump() also provides those things, although it has a lot of other stuff not needed to build the prediction function. I'll start coding something using this, but it won't be ideal. The two problems are: - The variable name / level name are still concatenated with no delimiter in the coefficients, so it's possible there will be ambiguous names - It feels rather clunky to be relying on these attributes when I feel like I should be adding methods directly to the class somehow... In SPLUS I came across a useful attribute 'assign', which has a mapping of term names to variables - the same attribute in R doesn't appear to provide this information. Is this available somewhere? What approaches are others using to apply their models to data sets where S is not available? Has anyone written any convertors of models to other languages? Is it possible to compile an expression or model into a DLL or COM object and access it that way? I'm aware of the SOAP interface, but that doesn't really suit our needs in this case. TIA, Jeremy __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Translating lm.object to SQL, C, etc function
The issue here is that coef() tells you the coefficients in R's internal parametrization of the model, and that is of no use to you unless you have a means of creating a model matrix in C, SQL or (heaven forbid) Perl. The information needed to re-create a model matrix is stored in the lm fit, but in ways that are going to be hard to use anywhere else (since they include R functions). This is not perverse: what R does is very general, *far* more so than SPSS. Formulae in lm can include poly() and ns() terms, for example. The only practical solution it seems to us is to ask R to create the model matrix for new data. Then the things you are talking about are just the colnames of that matrix, and don't need to be interpreted. You may want to read the sources to find out how R does it: that area is one of the most complex parts of the internals, and one in which bugs continue to emerge. On Fri, 14 Feb 2003 [EMAIL PROTECTED] wrote: This is my first post to this list so I suppose a quick intro is in order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, and love S already. I'm reading MASS and also John Fox's book - both have been very useful. My background in stat software was mainly SPSS (which I've never much liked - thanks heavens I've found S!), and Perl is my tool of choice for general-purpose programming (I chaired the perl6-language-data working group, responsible for improving the data analysis capabilities in Perl). I have just completed my first S project, and I now have 8 lm.objects. The models are all reasonably complex with multiple numeric and factor variables and some 2-way and 3-way interactions. I now need to use these models in other environments, such as C code, SQL functions (using CASE) and in Perl - I can not work out how to do this. The difficulty I am having is that the output of coef() is not really parsable, since there is no marker in the name of an coefficient of separate out the components. For instance, in SPSS the name of a coefficient might be: var1=[a]*var2=[b]*var3 ...which is easy to write a little script to pull that apart and turn it into a line of SQL, C, or whatever. In S however the name looks like: var1avar2bvar3 ...which provides no way to pull the bits apart. I find that impossible to understand anyway, but doubt that it corresponds to SPSS. For a variable V, label Va does not mean V=[a] except in unusual special cases. So my question is, how do I export an lm.object in some form that I can then apply to prediction in C, SQL, or some other language? All I'm looking for is some well-structured textual or data frame output that I can then manipulate with appropriate tools, whether it be S itself, or something like Perl. Thanks in advance for any suggestions (and apologies in advance if this is well documented somewhere!), Jeremy __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] How to keep two Vectors to be
Hi all, I am beginner of R. I want to ask for help from you. I have two data.frame type object : s40 and s100. s40 and s100 have same structure: they are actually two dimention array like : V1V2 34 6768 234 36 65 60 . Now s40 and s100 have almost same value in V1, but they lack some value in V1 from each other. What I want to do is to expand them to be same long by inserting those lacking values into V1 of s40 and s50 and the responed value in V2 is 0 or mean of V2. Is there any easy way to set this problem down? Any help will be appreciated very much! Jia Yiyu __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] factorial function
Sorry for the stupid question, but is there the factorial function in R? I tried to find it using help.search('factorial') but got nothing appropriate. Many thanks, -Serge __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] off topic: sharing of software in the life sciences
This might be of interest to some people in these lists. The latest issue of Science (vol 299, 14 Febr. 2003), on p. 990, mentions a recent report from the National Academy of Sciences that deals with some guidelines for the sharing of data and research materials in the life sciences. The NAS report can be accessed from http://bob.nap.edu/books/0309088593/html/ and the most relevant pages, regarding making code available, are pp. 20-23 and p. 27. -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] factorial function
Hi, - Original Message - From: Serge Boiko [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, February 14, 2003 11:36 PM Subject: [R] factorial function Sorry for the stupid question, but is there the factorial function in R? I tried to find it using help.search('factorial') but got nothing appropriate. There isn't. But there are at least four different ways to do this -- from the S Programming Workshop (by Dr. Ross Ihaka): # Iteration fac1 - function(n) { ans - 1 for(i in seq(n)) ans - ans * i ans } # Recursion fac2 - function(n) if (n = 0) 1 else n * fac(n - 1) # Vectorised fac3 - function(n) prod(seq(n)) # Special Mathematical Function -- Gamma fac4 - function(n) gamma(n+1) Of these Gamma is probably the most efficient. Note that the above hasn't got any debugging codes, you probably want to add them. Cheers, Kevin Ko-Kang Kevin Wang Master of Science (MSc) Student Department of Statistics University of Auckland New Zealand www.stat.auckland.ac.nz/~kwan022 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] factorial function
Dear Serge, For factorial of x, you can use gamma(x + 1). Alternatively, you can install the gregmisc package which has a factorial function that does that (if I recall correctly). Best, On Friday 14 February 2003 11:36, Serge Boiko wrote: Sorry for the stupid question, but is there the factorial function in R? I tried to find it using help.search('factorial') but got nothing appropriate. Many thanks, -Serge __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) http://bioinfo.cnio.es/~rdiaz __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Change array size
Is the following what you want? x- rnorm(800) xt - x[1:2^trunc(log(length(x),base=2))] length(xt) [1] 512 HTH, Andy -Original Message- From: Poizot Emmanuel [mailto:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 4:52 AM To: [EMAIL PROTECTED] Subject: [R] Change array size Hi, I would like to know if there is a way to change a vector of arbitrary size to make it fits the nearest upper size multiple of a power of 2. -- Cordialy Emmanuel POIZOT Cnam/Intechmer Digue de Collignon 50110 Tourlaville Tél : (33)(0)2 33 88 73 42 Fax : (33)(0)2 33 88 73 39 - __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] time series missing 0 counts
Yes, there is an easy way. Create the regular time series you want by something like x - ts(0, start=c(2000,52), end=c(2003,9), frequency=52) and fill in the time points you have data for by xYear - trunc(times(x)); xWeek - cycle(x) attach(mydata) x[(xYear==year) (xWeek==Week)] - Count detach() Easy! On Fri, 14 Feb 2003, Schnitzler, Johannes wrote: I have several large data sets with counts per week. (Maximum week per year is 52. Counts from Week 53 are added to week 52.) A data set contains for example: YearWeekCount 200052 2 20011 5 20012 7 20015 4 20017 2 ... ... ... ... ... ... Weeks with 0 counts are not listed in the data set. I want to perform time series analysis (frequency 52). Is there an easy way to expand the data set to: YearWeekCount 200052 2 20011 5 20012 7 20013 0 20014 0 20015 4 20016 0 20017 2 ... ... ... ... ... ... or is there already a function in ts, which i have not found so far, to deal with this problem? Thank you very much. Johannes Schnitzler Germany Berlin __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Translating lm.object to SQL, C, etc function
On Fri, 14 Feb 2003 16:37:42 +1100 [EMAIL PROTECTED] wrote: This is my first post to this list so I suppose a quick intro is in order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, and love S already. I'm reading MASS and also John Fox's book - both have been very useful. My background in stat software was mainly SPSS (which I've never much liked - thanks heavens I've found S!), and Perl is my tool of choice for general-purpose programming (I chaired the perl6-language-data working group, responsible for improving the data analysis capabilities in Perl). I have just completed my first S project, and I now have 8 lm.objects. The models are all reasonably complex with multiple numeric and factor variables and some 2-way and 3-way interactions. I now need to use these models in other environments, such as C code, SQL functions (using CASE) and in Perl - I can not work out how to do this. The difficulty I am having is that the output of coef() is not really parsable, since there is no marker in the name of an coefficient of separate out the components. For instance, in SPSS the name of a coefficient might be: var1=[a]*var2=[b]*var3 ...which is easy to write a little script to pull that apart and turn it into a line of SQL, C, or whatever. In S however the name looks like: var1avar2bvar3 ...which provides no way to pull the bits apart. So my question is, how do I export an lm.object in some form that I can then apply to prediction in C, SQL, or some other language? All I'm looking for is some well-structured textual or data frame output that I can then manipulate with appropriate tools, whether it be S itself, or something like Perl. Thanks in advance for any suggestions (and apologies in advance if this is well documented somewhere!), Jeremy Some functions that may give you some ideas, from the Design library (http://hesweb1.med.virginia.edu/biostat/s/Design.html).: Function(fit): generate S function to obtain predicted values from a regression fit that was done with Design in effect (i.e., fit with ols, cph, lrm, psm, glmD) latex(fit): generate LaTeX code for typesetting the model fit sascode(Function(fit)): translate formula to SAS notation What I think would be very useful would be a function like Function that instead symbolically creates the design matrix, and translating that function to SQL etc. This would allow computation of confidence limits. -- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] FW: [Fwd: Re: [S] Exact p-values]
Dear Spencer Thank you for this extensive explanation of the problem. I was just curious. Best regards Christian == Christian Stratowa, PhD Boehringer Ingelheim Austria Dept NCE Lead Discovery - Bioinformatics Dr. Boehringergasse 5-11 A-1121 Vienna, Austria Tel.: ++43-1-80105-2470 Fax: ++43-1-80105-2683 email: [EMAIL PROTECTED] -Original Message- From: Spencer Graves [SMTP:[EMAIL PROTECTED]] Sent: Friday, February 14, 2003 1:29 PM To: Stratowa,Dr,Christian FEX BIG-AT-V Cc: [EMAIL PROTECTED]; David Smith Subject: Re: [R] FW: [Fwd: Re: [S] Exact p-values] To understand the correct answer, you need to understand the following: pbinom(1, 2, .5) [1] 0.75 This is the binomial cumulative distribution function. *** pbinom(0, 2, .5) = 0.25 *** pbinom(1, 2, .5) = 0.75 = 0.25 + 0.5 *** pbinom(2, 2, .5) = 1 However, pbinom(1e15, 2e15, .5) is a computational challenge. Standard numerical algorithms often fail in situations like this. The code should test for such cases and use more numerically stable approximations in place of the exact algorithms. The standard deviation for a binomial is sqrt(p*(1-p)/n) = 0.5/sqrt(2e15), which is roughly 1e-8 in your case. I get the following from both S-Plus and R: pbinom(1e5+c(-1, 0, 1), 2e5, .5) [1] 0.4991079 0.5008921 0.5026762 For the problem you cite, the correct answer should be 0.5 to about 8 significant digits. Instead, I get 1 from R (as you did) and the following from S-Plus: pbinom(1e15,2e15,0.5) [1] 0.7411209 Both give wrong answers without warning, though in this case, S-Plus is closer. Answer the question? Spencer Graves # [EMAIL PROTECTED] wrote: Dear all Just for fun, I have just downloaded the paper mentioned below and checked it with R-1.6.1. Everything is ok with exception of Table 2b, where I get always 1 instead of 0.5: pbinom(1e15,2e15,0.5) [1] 1 Which value should be correct? Best regards Christian Stratowa == Christian Stratowa, PhD Boehringer Ingelheim Austria Dept NCE Lead Discovery - Bioinformatics Dr. Boehringergasse 5-11 A-1121 Vienna, Austria Tel.: ++43-1-80105-2470 Fax: ++43-1-80105-2683 email: [EMAIL PROTECTED] Original Message Subject: Re: [S] Exact p-values Date: Thu, 13 Feb 2003 18:31:38 +0100 From: Rau, Roland [EMAIL PROTECTED] To: 'Spencer Graves' [EMAIL PROTECTED], Jose María Fedriani Laffitte [EMAIL PROTECTED] CC: [EMAIL PROTECTED] Dear all, in relation to your question, the following working paper of Leo Knuesel, University of Munich, might be of interest: On the Accuracy of Statistical Distributions in S-Plus for Windows (1999) You can download the paper from (pdf-Format, 45k): http://www.stat.uni-muenchen.de/~knuesel/elv/accuracy.html Best, Roland -Original Message- From:Spencer Graves [SMTP:[EMAIL PROTECTED]] Sent:Thursday, February 13, 2003 6:12 PM To: Jose María Fedriani Laffitte Cc: [EMAIL PROTECTED] Subject: Re: [S] Exact p-values Try ( 1-pchisq(29.8, df=1)): With S-Plus 6.1, I got 4.78992e-008. By the way, the distribtion functions in R have more arguments. For example, pchisq(29.8, df=1, lower.tail=F) produces the same answer, and pchisq(29.8, df=1, lower.tail=F, log=T) produces its natural logarithm. Also, pchisq, dchisq, qchisq, and rchisq in R all have an ncp noncentrality parameter argument; only pchisq has such in S-Plus 6.1. Similarly, none of the Student's t functions in S-Plus have a non-centralitity parameter; in R, pt has an argument ncp, and from this one can easily program ncp for dt, qt and rt. Also, the distribution functions in the current release of S-Plus are known to have problems. For example, pt(-1, Inf) = 0.5 in S-Plus 6.1, but 0.159 in R; clearly, S-Plus gives a wrong answer without warning. Best Wishes, Spencer Graves Jose María Fedriani Laffitte wrote: Dear all, I want to get the exact p-values, on 1 degree of freedom, for an array of chi-square values. When my chi-square values are equal or lower than 29.7, I get the exact associated p-values. Thus, for instance: pchisq(29.7, df=1) [1] 0.999 However, when my chi-square values are greater or equal to 29.8 what I get is: pchisq(29.8, df=1) [1] 1 Could anyone tell me how to fix this trivial issue? Very grateful, Jose M. Fedriani Jose Mª Fedriani Laffitte Estacion Biologica de Donana (CSIC) Avda. Mª Luisa s/n 41013-Sevilla Spain Tel. +34-954232340 Fax +34-954621125 http://ebd.csic.es
Re: [R] Translating lm.object to SQL, C, etc function
Dear Jeremy, I've written replacements for the standard R contrast functions that produce the kind of more easily parsed (and more readable) contrast names that I think you have in mind. I intend to include these in the next release of the car package for R but haven't done so yet. Since the code isn't very long, I've appended it (and the .Rd documentation file to this note). Note that R does separate terms in an interaction with a colon. I hope that this does what you need. John Contrasts.R - # last modified 2 Dec 2002 by J. Fox # all of these functions are adapted from functions in the R base package contr.Treatment - function (n, base = 1, contrasts = TRUE) { if (is.numeric(n) length(n) == 1) levs - 1:n else { levs - n n - length(n) } lev.opt - getOption(decorate.contrasts) pre - if (is.null(lev.opt)) [ else lev.opt[1] suf - if (is.null(lev.opt)) ] else lev.opt[2] dec - getOption(decorate.contr.Treatment) dec - if (!contrasts) else if (is.null(dec)) T. else dec contr.names - paste(pre, dec, levs, suf, sep=) contr - array(0, c(n, n), list(levs, contr.names)) diag(contr) - 1 if (contrasts) { if (n 2) stop(paste(Contrasts not defined for, n - 1, degrees of freedom)) if (base 1 | base n) stop(Baseline group number out of range) contr - contr[, -base, drop = FALSE] } contr } contr.Sum - function (n, contrasts = TRUE) { if (length(n) = 1) { if (is.numeric(n) length(n) == 1 n 1) levels - 1:n else stop(Not enough degrees of freedom to define contrasts) } else levels - n lenglev - length(levels) lev.opt - getOption(decorate.contrasts) pre - if (is.null(lev.opt)) [ else lev.opt[1] suf - if (is.null(lev.opt)) ] else lev.opt[2] dec - getOption(decorate.contr.Sum) dec - if (!contrasts) else if (is.null(dec)) S. else dec show.lev - getOption(contr.Sum.show.levels) contr.names - if ((is.null(show.lev)) || show.lev) paste(pre, dec, levels, suf, sep=) if (contrasts) { cont - array(0, c(lenglev, lenglev - 1), list(levels, contr.names[-lenglev])) cont[col(cont) == row(cont)] - 1 cont[lenglev, ] - -1 } else { cont - array(0, c(lenglev, lenglev), list(levels, contr.names)) cont[col(cont) == row(cont)] - 1 } cont } contr.Helmert - function (n, contrasts = TRUE) { if (length(n) = 1) { if (is.numeric(n) length(n) == 1 n 1) levels - 1:n else stop(contrasts are not defined for 0 degrees of freedom) } else levels - n lenglev - length(levels) lev.opt - getOption(decorate.contrasts) pre - if (is.null(lev.opt)) [ else lev.opt[1] suf - if (is.null(lev.opt)) ] else lev.opt[2] dec - getOption(decorate.contr.Helmert) dec - if (!contrasts) else if (is.null(dec)) H. else dec nms - if (contrasts) 1:lenglev else levels contr.names - paste(pre, dec, nms, suf, sep=) if (contrasts) { cont - array(-1, c(lenglev, lenglev - 1), list(levels, contr.names[-lenglev])) cont[col(cont) = row(cont) - 2] - 0 cont[col(cont) == row(cont) - 1] - 1:(lenglev - 1) } else { cont - array(0, c(lenglev, lenglev), list(levels, contr.names)) cont[col(cont) == row(cont)] - 1 } cont } --- Contrasts.Rd -- \name{Contrasts} \alias{Contrasts} \alias{contr.Treatment} \alias{contr.Sum} \alias{contr.Helmert} \title{Functions to Construct Contrasts} \description{ These are substitutes for similarly named functions in the base package (note the uppercase letter starting the second word in each function name). The only difference is that the contrast functions from the car package produce easier-to-read names for the contrasts when they are used in statistical models. The functions and this documentation are adapted from the base package. } \usage{ contr.Treatment(n, base = 1, contrasts = TRUE) contr.Sum(n, contrasts = TRUE) contr.Helmert(n, contrasts = TRUE) } \arguments{ \item{n}{a vector of levels for a factor, or the number of levels.} \item{base}{an integer specifying which level is considered the baseline level. Ignored if \code{contrasts} is \code{FALSE}.} \item{contrasts}{a logical indicating whether contrasts should be computed.} } \details{ These functions are used for creating contrast matrices for use in fitting analysis of variance and regression models. The columns of the resulting matrices contain contrasts which can be used for coding a factor with \code{n} levels. The returned value contains the computed contrasts. If the argument \code{contrasts} is \code{FALSE}
[R] RAV AntiVirus scan results
--- This e-mail is generated by the www.unipa.it mail server to warn you that the e-mail sent by [EMAIL PROTECTED] to [EMAIL PROTECTED] is infected with virus: HTML/IFrame_Exploit*. Please contact your system administrator for further information. If you are the sender: --- The scanned e-mail has your address in the From header field. Either your computer is infected or someone's computer having your e-mail address in the address book has been infected. (Please note that some viruses are sending e-mails directly from your computer. Our advise is to check your computer using an up-to-date antivirus product). If you are the receiver: - Please contact the sender: very probably he/she doesn't know he/she has a computer virus. Actions taken for the infected files: - The infected file was saved to quarantine with name: 1045228118-dfh1ED8cq03740. The file (part:)-(IFRAME) attached to mail (with subject:Colours ) sent by [EMAIL PROTECTED] to [EMAIL PROTECTED] is infected with virus: HTML/IFrame_Exploit*. Cannot clean this file. The file was successfully deleted by RAV AntiVirus. The file (part0001:alpha,.pif) attached to mail (with subject:Colours ) sent by [EMAIL PROTECTED] to [EMAIL PROTECTED] is infected with virus: Win32/Klez.H@mm. Cannot clean this file. The file was successfully deleted by RAV AntiVirus. this is a copy of the e-mail header: RAV AntiVirus for Linux i386 version: 8.4.0 (snapshot-20020919) Scan engine 8.9 for i386. Last update: Thu, 13 Feb 2003 18:34:41 +01 Scanning for 78123 malwares (viruses, trojans and worms). __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Change array size
Liaw, Andy [EMAIL PROTECTED] writes: Is the following what you want? x- rnorm(800) xt - x[1:2^trunc(log(length(x),base=2))] length(xt) [1] 512 I don't think so (notice upper). More likely x - rnorm(800) l - length(x) xt - c(x,numeric(2^ceiling(log(l,base=2))-l)) length(xt) # 1024 but fits might also imply interpolation? Hi, I would like to know if there is a way to change a vector of arbitrary size to make it fits the nearest upper size multiple of a power of 2. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] data manipulation function descriptions
On Fri, 14 Feb 2003 [EMAIL PROTECTED] wrote: On Thu, 13 Feb 2003, kjetil brinchmann halvorsen wrote: On 13 Feb 2003 at 17:09, Jason Bond wrote: case switch [R-core : switch should be better announced. It is for instance not mentioned in An introduction to R] Well, that is an *introduction*, not a programmer's guide. You will find switch() is rarely used in R: it is a bit peculiar in its semantics, and something definitely not to be considered introductory. On the original question, I think it would be a mistake to translate what you know. R is a vector language, not a pairlist language, and I see quite a bit of evidence of convoluted solutions in its internals dating from when R was the second. Chapter 2 of Venables Ripley (2002) (as in the R FAQ) is devoted to using S/R for data manipulation. As someone reasonably familiar with both languages I have to disagree with several points here. First and foremost, despite differences in surface syntax, as languages xlispstat and R are much more alike than they are different. xlispstat is much closer to R than S-plus because both xlispstat and R use lexical scope, a feature of R that is still not used as much as it could be. The main language differences are the limited form of lazy evaluation used in R, which you can usully ignore, and the fact that R does not provide mutable data structures, which is also rarely an issue. There are other differences, but these are the main ones that affect coding practices I think. The basic xlispstat data handling functions mentioned in the original post are quite similar to corresponding basic functions in R. This is not by accident: the choice of functions included in xlispstat was heavily influenced by what was then called the New S language. As a result, if you want to create an R version of an xlispstat function you can often do far worse than start with a fairly direct transliteration. In my view at least, good coding practices in xlispstat are good coding practices for any high level mostly functional language and carry over quite well to R. I am sorry if the following seems a bit harsh, but I, and many others who have worked with lisp, find it extremely frustrating to read statements about lisp like the one above that suggest that lisp is a pairlist language only, especially when these statements come from people I thought knew better. Lisp dates back to the 1950's. The only other language of any consequence still in use from that era is FORTRAN. No one would now claim that a major flaw in FORTAN is the lack of an if-then-else construct. That was true in the early days but has not been for several decades. But for some reason many people seem very happy to very authoritatively make statements about lisp that, if they were ever true at all, have not been so for a very long time indeed. Pairlists are a very useful data structure for expressing many algorithms in a functional style. That is why they were one of the first data structures in Lisp, and that is why they are available in virtually all other high level functional languages (ML, Haskell, Miranda, Clean, ...). Pailrists are NOT the only data structure in Lisp. For many years Lisp has also supported vectors and arrays, both generic and typed (and other data structures). Vectors and pairlists are collectively referred to as sequences, and, if I remember correctly, all the functions listed in the original post except mapcar are designed to work on all kinds of sequences (the sequence version of mapcar is map). Code written in xlispstat in terms of sequence functions can often be translated quite easily to R, and the resulting code will be quite consistent with good R coding practices. R does not provide a pairlist data structure. This creates a dilemma when translating some list-based xlispstat code, or, more importantly, when implementing an algorithm for which parilists are the natural data structure to use. There are two choices: use a vector based algorithm that may be a bit less natural but fits better with the basic R data structures, or build your own pairlist abstraction for this particular problem and write the algorithm the more natural way. I have used both approaches on different occasions. I usually prefer to write an algorithm in the most natural way for the algorithm, since that usually maximizes the probability that my code is actually correct. If this approach requires some additional abstract data types, be they pairlists or anything else, then I develop and test them separately and write the main code in terms of these abstractions. Occasianally, but not all that often, this results in code that is slower than I like; then I
[R] Numeric Coerceing
Does anyone know how to coerce a numeric to a string?? THanks Wayne KSS Ltd A division of Knowledge Support Systems Group plc Seventh Floor St James's Buildings 79 Oxford Street Manchester M1 6SS England Company Registration Number 2800886 (Limited) 3449594 (plc) Tel: +44 (0) 161 228 0040 Fax: +44 (0) 161 236 6305 mailto:[EMAIL PROTECTED]http://www.kssg.com The information in this Internet email is confidential and may be legally privileged. It is intended solely for the addressee(s). Access to this Internet email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Internet email are subject to the terms and conditions expressed in the governing engagement letter or contract. This email message and any attached files have been scanned for the presence of computer viruses. However you are advised that you open any attachments at your own risk. [[alternate HTML version deleted]] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Numeric Coerceing
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Wayne Jones Sent: Friday, February 14, 2003 8:20 AM To: [EMAIL PROTECTED] Subject: [R] Numeric Coerceing Does anyone know how to coerce a numeric to a string?? THanks Wayne See ?as.character For example: y - 123 y [1] 123 as.character(y) [1] 123 Regards, Marc Schwartz __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Change array size
then change trunc to ceiling??? Peter Dalgaard BSA wrote: Liaw, Andy [EMAIL PROTECTED] writes: Is the following what you want? x- rnorm(800) xt - x[1:2^trunc(log(length(x),base=2))] length(xt) [1] 512 I don't think so (notice upper). More likely x - rnorm(800) l - length(x) xt - c(x,numeric(2^ceiling(log(l,base=2))-l)) length(xt) # 1024 but fits might also imply interpolation? Hi, I would like to know if there is a way to change a vector of arbitrary size to make it fits the nearest upper size multiple of a power of 2. __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] pairlists (was: data manipulation function descriptions)
-Original Message- From: Luke Tierney [mailto:[EMAIL PROTECTED]] R does not provide a pairlist data structure. This creates a dilemma when translating some list-based xlispstat code, or, more importantly, when implementing an algorithm for which parilists are the natural data structure to use. ... Pairlists were and still are used internally for many things. ... Wouldn't it, therefore, make sense to provide a 'pairlist' package which exposes the internal pairlist structure and provides appropriate functions (car, cdr, ...), instead of expecting people to keep re-implementing these features? -Greg LEGAL NOTICE\ Unless expressly stated otherwise, this message is ... [[dropped]] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] programs for genetics - haplo.score for R
It would appear that Gregory Warnes has ported it to R and the package `haplo.score' can be downloaded from CRAN (http://cran.r-project.org). -roger ___ UCLA Department of Statistics [EMAIL PROTECTED] http://www.stat.ucla.edu/~rpeng On Fri, 14 Feb 2003, Shona Livingstone wrote: colorparam0100,0100,0100/paramDear All, I wish to use a suite of programs called haplo.score first written in S plus by Rowland et al of the Mayo clinic (details given below). Unfortunately, I do not have S plus available to me at the moment Has anyone written the equivalent for R? Any pointers will be appreciated, bearing in mind that I am new to R. Thank you for your help Shona Livingstone Epidemiology and Public Health, UCL * boldFontFamilyparamArial/paramsmallerhaplo.score italicScore Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous /italic/boldFontFamilyparamTimes New Roman/paramCharles M. Rowland, David E. Tines, and Daniel J. Schaid Mayo Clinic Rochester, MN E-mail contact: [EMAIL PROTECTED] boldFontFamilyparamArial/paramI [/boldFontFamilyparamTimes New Roman/paramA suite of S-PLUS routines, referred to as haplo.score, can be used to compute score statistics to test associations between haplotypes and a wide variety of traits, including binary, ordinal, quantitative, and Poisson. These methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers). The methods provide several different global and haplotype-specific tests for association, as well as provide adjustment for non-genetic covariates and computation of simulation p-values (which may be needed for sparse data). Details on the background and theory of the score statistics can be found in the following reference: Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests for association of traits with haplotypes when linkage phase is ambiguous. American J Human Genetics, February, 2002.] nofill __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] How to solve A'A=S for A
It is not clear to me that one can. If the singular value decomposition of A is the triple product P d Q', then the singular value decomposition of A'A=S is Q d^2 Q'. The information about the orthonormal matrix P is lost, is it not? ** Cliff Lunneborg, Professor Emeritus, Statistics Psychology, University of Washington, Seattle Visiting: Melbourne, Feb-May 1999, Brisbane, Jun-Aug 1999, Sydney, Sep-Nov 1999, Perth, Dec 1999-Feb 2000 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] pairlists (was: data manipulation function descriptions)
Warnes, Gregory R [EMAIL PROTECTED] writes: -Original Message- From: Luke Tierney [mailto:[EMAIL PROTECTED]] R does not provide a pairlist data structure. This creates a dilemma when translating some list-based xlispstat code, or, more importantly, when implementing an algorithm for which parilists are the natural data structure to use. ... Pairlists were and still are used internally for many things. ... Wouldn't it, therefore, make sense to provide a 'pairlist' package which exposes the internal pairlist structure and provides appropriate functions (car, cdr, ...), instead of expecting people to keep re-implementing these features? Some ancient consideration pops up here. We do actually expose pairlists in a few places (try mode(.Options)). Some people consider that this is a remnant and should be stamped out, but we might also consider doing what you suggest. The big problem with old R was not so much the pairlists but that they were used for representing objects of mode list so to get to X[[n]] you had to count through the list from the beginning which killed performance in some important cases. Then again, adding elements to a generic vector requires copying the whole thing. Of course all the legacy S code tended to do the former and not the latter, so generic vectors ended up winning. One or two reservations: With full lisp style access, could we end up with (circular) data structures that confuse the garbage collector? And might we -- supposing we allowed destructive list modifications -- end up with strange semantics a la the .Alias mess we had for a while? Of course Luke would be the first to know about this. Then of course there is the question of reverse compatibility. I don't consider it much of a loss if R code doesn't run in Splus, but others might. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Function update problem
Dear Professor Ripley, Thank you very much for your help. I tested the idea by form - substitute(.~.+x[,i], list(i=i)), however the problem still is there. In order to automate model selection, I prefer to use update. My regression problem is to test significant predictors from several hundred candidates using forward regression by BIC criteria. For each loop, first to find one with maximum BIC and then to update lm to get a new model. So, it is convenient to useupdate function. I would be grateful if you could still help. In addition, I tried the way update+add1 to my question. But, new problem is update.default(model, . ~ . + x) need an object with call component, help? The code is: model-add1(model,.~.+form) model.new-update(model,.~.+x) With best regards, Lun You can use substitute: something like (untested) for(i in 1:100){ form - substitute(.~.+x[,i], list(i=i)) model - update(model, form) ## do something useful in here } and you do not need to update unchanged arguments! However, why are you rewriting add1.default, when there is add1.lm? On Thu, 13 Feb 2003, lun li wrote: Dear all, I am trying an automatic model selection for a multiple linear regression using function lm and update. But, I meet a problem when using update. The problem is the function update can not update when variables as a vector(for example,x is a matrix with 100 regression variables). The code is as below: model-lm(y~x1,singular.ok=T,na.action=na.omit) for(i in 1:100){ model-update(model,.~.+x[,i],singular.ok=T,na.action=na.omit)} If the above code is represented as below, I can get the correct result. However, I must use the loops. model-lm(y~x1,singular.ok=T,na.action=na.omit) model-update(model,.~.+x[,1],singular.ok=T,na.action=na.omit) model-update(model,.~.+x[,2],singular.ok=T,na.action=na.omit) .. model-update(model,.~.+x[,100],singular.ok=T,na.action=na.omit) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help - End forwarded message - Lun Li Department of Geology and Geophysics The University of Edinburgh Grant InstituteTel. +44(0)131 650 7339 King's Buildings Fax. +44(0)131 668 3184 West Mains Road E-mail:[EMAIL PROTECTED] Edinburgh EH9 3JW UK __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] error in unique(pkgs) (fwd)
I read your reply as an R mail archive in respect to the installation of new R contributed packages on a Mac OS X. I found the same problem, I use OroborOSX and emacs-ESS, I wanted to install xtable, ape packages, with the install.packages() command, and I got the same error message (naturally: sudo emacs - which avoids the argument lib missing error message), Error in unique(pkgs) : Object ape not found Ape is not a base package so it should be available separately. Any idea? Thanks for your help in advance! You really need to use the r-help list for this, I'm not a Mac expert. (I think you might need to fetch the Stuffit archive from CRAN and unstuff it manually.) -p -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] matrix from sequences
Hi, Miha: 1. How do I get the GRASS library? library(GRASS) produced Error in library(GRASS) : There is no package called `GRASS' for me from R 1.6.2 for Windows. I do not know whether it exists for Windows or not, but look under: http://cran.r-project.org/src/contrib/Devel/ Or visit: http://grass.itc.it/index.html 2. I assume there is a typographical error in the last line of your email: If G$xseq and $yseq are coordinates of points, then length(G$xseq) == length(G$yseq)??? In that case, 'as.matrix(G[,c(xseq, yseq)])' should give you what you want. Or am I missing something? OK I really am lousy at explaining things. The length(G$xseq) * length(G$yseq) stands because you have to get all the permutations of the elements of those two sequences. Get it? If you have: xseq-1:10 yseq-1:10 I would like to get: x y [,1] [,2] [,3] [,4] [,5] ... [1,] [2,] [3,] [4,] ... or str(xy) $x 1,1,1,1,1,1,1,1,1,1,2,2,2,... $y 1,2,3,4,5,6,7,8,9,10,1,2,3,... Spencer Graves Miha STAUT wrote: Miha STAUT wrote: Hi all, I have a data frame with sequences of x and y from a map. I would like to know it both ways: 1. How to make a matrix from that; 2. how to make a data frame of all points in a map. Probably it is a silly question, but please tell me where to read about it or tell me how to do it. Miha Staut __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help Hi Miha, 1) What is the structure of your data.frame ? Assuming all co-ordinates are in the same column (one column for x and one column for y), the simplest way to extract them and turn them into a matrix would be: as.matrix(mydata[ , c(x, y)]) e.g.: Rmydata - data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10)) Rmydata x y z 1 -0.73735224 -0.51218243 -0.9602624 2 -1.46079091 -0.63634091 1.4967066 3 -0.28574919 -1.30719383 -0.2887403 4 0.04137159 0.61711350 -0.7057102 5 0.03179303 0.05734869 -0.4637660 6 -0.06638058 -0.74565157 0.9239402 7 -0.67611541 -1.01760810 -0.2854017 8 0.34215052 0.30564550 0.6931193 9 0.83597837 0.75443762 -2.3394679 10 -0.14967073 -0.02027512 -0.1143414 Ras.matrix(mydata[ , c(x, y)]) x y 1 -0.73735224 -0.51218243 2 -1.46079091 -0.63634091 3 -0.28574919 -1.30719383 4 0.04137159 0.61711350 5 0.03179303 0.05734869 6 -0.06638058 -0.74565157 7 -0.67611541 -1.01760810 8 0.34215052 0.30564550 9 0.83597837 0.75443762 10 -0.14967073 -0.02027512 2) How are the points stored ? If in a matrix, say mat, with 2 columns for x and y, simply: as.data.frame(mat) Best, Renaud Thanks to both of you (Dr Renaud Lancelot and James Holtman) I see I formulated the question in a wrong way. I got from GRASS the coordinates of a map. There is a package in R named GRASS to connect R with GRASS. library(GRASS) G-gmeta() # copy the environment from GRASS Now G is a data frame containig also $xseq and $yseq which would be the coordinates of all the points in x and y direction. The final matrix should have length(G$xseq) * length(G$yseq) points. Miha Staut __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] function editing?
See ?fix. Joshua Gramlich wrote: Is there a way to edit user defined functions once they've been created? For instance, I've a simple function that plots a table, but I'd like to go back and add more parameters to the barplot call. Is there a way to change this function without completely starting from scratch? Other than storing the code in a file and re-running it? Joshua Gramlich Piocon Technologies Chicago, Illinois USA __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] function editing?
See ?edit If you use ESS, C-c C-d. On Friday 14 February 2003 04:07 pm, Joshua Gramlich wrote: Is there a way to edit user defined functions once they've been created? For instance, I've a simple function that plots a table, but I'd like to go back and add more parameters to the barplot call. Is there a way to change this function without completely starting from scratch? Other than storing the code in a file and re-running it? Joshua Gramlich Piocon Technologies Chicago, Illinois USA __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Translating lm.object to SQL, C, etc function
On Fri, 14 Feb 2003 08:06:45 + (GMT), [EMAIL PROTECTED] said: The issue here is that coef() tells you the coefficients in R's internal parametrization of the model, and that is of no use to you unless you have a means of creating a model matrix in C, SQL or (heaven forbid) Perl. The information needed to re-create a model matrix is stored in the lm fit, but in ways that are going to be hard to use anywhere else (since they include R functions). This is not perverse: what R does is very general, *far* more so than SPSS. Formulae in lm can include poly() and ns() terms, for example. I understand that. And indeed a perfectly general function export is a very big job. However, once we can export the model into a reasonably generic textual form, simply including the text name of any R functions in the export, then users can create special-case translators for the parts that they need. We try to make this as easy for ourselves as possible, for instance by doing all required transformations in SQL (where possible) before importing to R, which means that all the terms in the linear model are often untransformed variables. The only thing we don't do in SQL normally is creating the contrasts, since this is something that SQL is not well suited for. The only practical solution it seems to us is to ask R to create the model matrix for new data. Then the things you are talking about are just the colnames of that matrix, and don't need to be interpreted. Yes, that makes things pretty easy then, but's it's not an option in all cases. We need to embed our models into C code. Previously we had a routine to take the SPSS output, convert it into C code, and then recompile the C code into our simulation. The linear model is utilised in the inner loop of the simulation so needs to be very fast; CORBA or SOAP calls to uncompiled code in the inner loop slow things down a great deal. In addition, the simulation is accessed by many people - requiring all of them install R would make the roll-out procedure much more complex. You may want to read the sources to find out how R does it: that area is one of the most complex parts of the internals, and one in which bugs continue to emerge. I'm glad to hear it is considered complex! ;-) I've actually been reading that bit of the code quite a bit over the last two days and haven't been getting that far. My lack of familiarity with the language, combined with the lack of comments in that section of code, and the very concise/non-descriptive variable names often used in the code, make this even harder. Still, it's a useful exercise for learning more about the language. The difficulty I am having is that the output of coef() is not really parsable, since there is no marker in the name of an coefficient of separate out the components. For instance, in SPSS the name of a coefficient might be: var1=[a]*var2=[b]*var3 ...which is easy to write a little script to pull that apart and turn it into a line of SQL, C, or whatever. In S however the name looks like: var1avar2bvar3 ...which provides no way to pull the bits apart. I find that impossible to understand anyway, but doubt that it corresponds to SPSS. For a variable V, label Va does not mean V=[a] except in unusual special cases. I should firstly mention that I got this slightly wrong - I showed above the SPLUS output, not the R output. R actually looks like this: var1a:var2b:var3 The ':'s certainly help a lot, but still there's the problem of handling factor levels, which are concatenated with the variable name without a delimiter (at least, in all the linear models I've run so far, this is the case). I think with all the great feedback and ideas I've got so far on the list and in private mail (thanks everyone!) I have enough information to make a start. If I create anything that might be more generally useful I'll post back of course. Many thanks, Jeremy -- Jeremy Howard [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] using locator with xyplot() result
Dear R-Users, Is there a way to interactively get location of a point on a graph produced by xyplot() of lattice package (similar to what locator() does with a regular plot)? Thanks, Vadim -- DISCLAIMER \ This e-mail, and any attachments thereto, is intend ... [[dropped]] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] using locator with xyplot() result
No. On Friday 14 February 2003 04:37 pm, Vadim Ogranovich wrote: Dear R-Users, Is there a way to interactively get location of a point on a graph produced by xyplot() of lattice package (similar to what locator() does with a regular plot)? Thanks, Vadim __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Translating lm.object to SQL, C, etc function
On Fri, 14 Feb 2003 07:53:46 -0500, John Fox [EMAIL PROTECTED] said: I've written replacements for the standard R contrast functions that produce the kind of more easily parsed (and more readable) contrast names that I think you have in mind. I intend to include these in the next release of the car package for R but haven't done so yet. Since the code isn't very long, I've appended it (and the .Rd documentation file to this note). Note that R does separate terms in an interaction with a colon. I hope that this does what you need. ... ## Coefficients: ## (Intercept) income education ## 2.2757530.0035221.713275 ## type[T.prof] type[T.wc] income:type[T.prof] ## 15.351896 -33.536652 -0.002903 ## income:type[T.wc] education:type[T.prof]education:type[T.wc] ## -0.0020721.3878094.290875 Yes, it's perfect. Thanks so much (and also thanks for your really readable and useful book, including web appendices)! Regards, Jeremy -- Jeremy Howard [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] time series missing 0 counts
On Fri, Feb 14, 2003 at 11:21:33AM +, [EMAIL PROTECTED] wrote: Yes, there is an easy way. Create the regular time series you want by something like x - ts(0, start=c(2000,52), end=c(2003,9), frequency=52) and fill in the time points you have data for by xYear - trunc(times(x)); xWeek - cycle(x) attach(mydata) x[(xYear==year) (xWeek==Week)] - Count detach() But won't this will fail for weeks with a count number of 0 or 53 (as both of those are outside the ts() range specified above)? As 52*7=364 is different from the number of days in a year, each year is bound to have one of those unless the data is pre-scrubbed. Dirk Easy! On Fri, 14 Feb 2003, Schnitzler, Johannes wrote: I have several large data sets with counts per week. (Maximum week per year is 52. Counts from Week 53 are added to week 52.) A data set contains for example: Year WeekCount 2000 52 2 2001 1 5 2001 2 7 2001 5 4 2001 7 2 ... ... ... ... ... ... Weeks with 0 counts are not listed in the data set. I want to perform time series analysis (frequency 52). Is there an easy way to expand the data set to: Year WeekCount 2000 52 2 2001 1 5 2001 2 7 2001 3 0 2001 4 0 2001 5 4 2001 6 0 2001 7 2 ... ... ... ... ... ... or is there already a function in ts, which i have not found so far, to deal with this problem? Thank you very much. Johannes Schnitzler Germany Berlin __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Prediction is very difficult, especially about the future. -- Niels Bohr __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help