[R] Fwd: When modeling with negbin from the aod package...
-- Forwarded message -- From: alexander russell ssv...@gmail.com Date: Tue, Oct 20, 2009 at 4:34 PM Subject: Re: [R] When modeling with negbin from the aod package... To: Matthieu Lesnoff matthieu.lesn...@gmail.com Hello again, It seems that, though we have a simple estimate of the variance, phi, with negbin, some models seek to create a formula for the variance. For example, I think Bolker has modeled variance in the Lily_sum data set as *nlikfun = function(a, c, d) {* *+ k = c * exp(d * flowers)* *+ -sum(dnbinom(seedlings, mu = a, size = k, log = TRUE))* *+ }* (book, p. 420) if I'm correct? Is it fair to say that the 'random' argument for negbin will model variance this way only if the independent variable on which the variance depends in the right-hand formula is categorical or is a factor with a few levels only? regards, s On Thu, Oct 15, 2009 at 5:53 PM, Matthieu Lesnoff matthieu.lesn...@gmail.com wrote: that is they are maximum likelihood estimates. Would someone tell me if I'm correct? yes, all parameters estimated with negbin (aod) are ML estimates Regards Matthieu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with different columns
Karl Ove Hufthammer wrote: In article 4addc1d0.2040...@yahoo.de, niederlein-rs...@yahoo.de says... In every list entry is a data.frame but the columns are only partially the same. If I have exactly the same columns, I could do the following command to combine my data: do.call(rbind, myList) but of course it does not work with differnt column names. Is there any easy way to retrieve a combined table like this: You're in luck. 'rbind.fill' in the 'plyr' package does exactly what you want. Thanks a lot for your hint. I'll give it a look :-) So far, I have solved it with basic tools like this (it's less work than I thought): availableColumns - unique(unlist(sapply(myList, names))) myList - lapply(myList, FUN = function(entry) { missingColumns - availableColumns[which(!availableColumns %in% names(entry))] entry[, missingColumns] - NA entry }) do.call(rbind, myList) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] re ferring to data of previous rows
Dear Rlers, in the following dataset I would like to insert a new column that refers to the data of the previous row. My question is whether the probability of a female (Id) changes if she had given birth to a pup in the previous year. So my dataframe consists of the column Id, year (2003-2007 for each Id) and offspring (=of; 1-0): Id year of 1 2003 1 1 2004 0 1 2005 1 1 2006 1 1 2007 0 with 1= female (Id) gave birth to offspring (same year) 2 2003 0 and 0 = female didn't pup 2 2004 1 2 2005 1 2 2006 0 2 2007 1 3 2003 1 3 2004 1 3 2005 1 3 2006 0 3 2007 0 Now I want to add the column offspring in the previous year - yes/no(1-0) the output now should look like this: Id year of of_inPreviousYear 1 2003 1 NA 1 2004 0 1 1 2005 1 0 1 2006 1 1 1 2007 0 1 2 2003 0 NA 2 2004 1 0 2 2005 1 1 2 2006 0 1 2 2007 1 0 3 2003 1 NA 3 2004 1 1 3 2005 1 1 3 2006 0 1 3 2007 0 0 any idea how I could program that? thanks a lot, clion -- View this message in context: http://www.nabble.com/referring-to-data-of-previous-rows-tp25987364p25987364.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] editors for R
Hi Mark, After reviewing the IDE/Script Editors article at sciviews.org, I wanted to pose a quick question here to see if anyone can offer an opinion or commentary about GUI editors that can be installed in a Windoze environment that allow editing/saving of remote .R files and running R programs from within a shell that is housed in the editor (although R itself is installed in a Linux environment). Windoze would essentially be providing a GUI-based tool to edit, save, and execute without making the user copy files back and forth and switch between various programs to execute their routines. Thus far, BlueFish seems to come closest to this; but other recommendations will be most appreciated. Eclipse + StatET; see http://lists.r-forge.r-project.org/pipermail/statet-user/2009-September/000208.html for information on the latest release. For remote connections, you need to install RJ-Server (on the Linux machine) and configure a Remote R Console (on the client). There is a dedicated mailing list at https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/statet-user HTH, Tobias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem using the source-function within R-functions
Hi Giovanni, Thanks for your reply. I can make the function work after parsing the code directly into R. The problem arise after compiling the function into a package and then calling the function, because the files inside source() seems to be missing. I tried to include the sourced files in the argument code_files of the function package.skeleton. The files are brought correctly to the package but when running the generated package then the files produce an error due to variables that are not defined. There may be no way around other than copying the content of the sourced files into the file where the function is defined? - in this way the definition of the function is all written in one file, but the organization of the code-sections is worse than when using the source()-function... Best regards, Johan 2009/10/20 Giovanni Petris gpet...@uark.edu The problem probably lies in the source-ing part: look at getwd() setwd() HTH, Giovanni Date: Tue, 20 Oct 2009 13:00:02 +0200 From: Johan Lassen jle...@gmail.com Sender: r-help-boun...@r-project.org Precedence: list DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; --===0554064772== Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 1477 Dear R community, You may have the solution to how to construct a function using the function source() to build the function; i.e. myfunction - function(...){ source('file1.r') source('file2.r') } After compiling and installing the myfunction in R, then calling the myfunction gives an error because the content of 'file1.r' and 'file2.r' seems to be missing. Anyone has the trick to overcome this problem? Thanks in advance! best wishes, Johan PS: My function is: run_accumm_value - function(ind_noder_0, ind_loc_val,ind_retention,downstream){ ## Preprocessing of looping calculations: koersel_uden_ret - length(unique(ind_noder_0$oplid)) opsaml_b_0_2 - numeric(koersel_uden_ret) opsaml_b_0_2_1 - numeric(koersel_uden_ret) opsaml_b_0_2_2 - seq(1:koersel_uden_ret) ## Preprocessing of topology and local values to be summed: source('preproces_topology.r', local =3D T) source('preproces_loc_val.r', local =3D T) # Loop for each grouping factor (column in ind_noder_0: oplid): for(j in 1:koersel_uden_ret){ source('matrix_0.r', local =3D T) source('matrix.r', local =3D T) source('local_value.r', local =3D T) source('fordeling.r', local =3D T) source('fordeling_manuel.r', local =3D T) source('local_ret.r', local =3D T) source('Ax=3Db.r', local =3D T) source('opsamling_x_0_acc.r', local =3D T) } source('opsamling_b_1.r', local =3D T) opsaml_b_2 } --=20 Johan Lassen Environment Center Nyk=F8bing F Denmark [[alternative HTML version deleted]] --===0554064772== Content-Type: text/plain; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --===0554064772==-- -- Giovanni Petris gpet...@uark.edu Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ -- Johan Lassen Environment Center Nykøbing F Denmark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] slope calculation
Dear all I am new R user, and trying to learn more. I am doing linear regression analysis in R with my data. I am trying to find the way to calculate the slope value (coefficient of x) to degree of slope. Please give me idea on this. Thanking you in anticipation Warm regardMS _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problems with randomSurvivalForest
summary(ma.dati2$death.status)- censoring Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 0.0 0.0 0.0 0.05332 0.0 1.0 39.0 ## summary(ma.dati2$time.death)--- time Min. 1st Qu. MedianMean 3rd Qu.Max.NA's 1 370 370 356 370 370 39 formula timi.out - rsf(Survrsf(time.death,death.status)~sex, ma.dati2.rid, ntree = 1000,na.action='na.omit') Errore in rsf.default(Survrsf(time.death, death.status) ~ sex, ma.dati2.rid[, : Outcome is not a random survival object. Use 'Survrsf' for the formula. But time.death is strictly positivi and death.status assume only two value:,0,1. Where is the error??? TIA Giovanni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solved problems with randomSurvivalForest
Only a confusion with two dataset with similar names... Sorry Giovanni [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a legend that automatically reads the values from two vectors?
On 10/21/2009 01:30 AM, jcano wrote: betav-c(0.78,0.94,0.88,0.41,0.59,4.68) etav-c(235.6,59.5,31.2,8.7,3.2,1174) Hi Javier, Maybe not exactly what you want, but try: addtable2plot(2,8,rbind(betav,etav),bty=o, display.colnames=FALSE,display.rownames=TRUE) using your own x and y coordinates, of course. addtable2plot is in the plotrix package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] three related time series with different resolutions
I have three time series, x, y, and z, and I want to analyse the relations between them. However, they have vastly different resolutions. I am writing to ask for advice on how to handle this situation in R. x is a stimulus, and y and z are responses. x is a rectangular pulse 4 sec long. Its onset and offset are known with sub-millisecond precision. The onset varies irregularly -- it doesn't fall on neat 1/2 sec or sec boundaries for example. y is a sampled continuous waveform. It is highly noisy, and it is actually well represented by samples 1/2 sec apart or so. z is a very short pulse -- perhaps 5 ms long -- occurring at irregular times. I have problems with how to represent these waveforms at the input stage (during data collection) and at the analysis stage. If I want to represent x and z with the sort of precision I'd like, I'd have to sample every ms or so. But: - that is massive overkill for y, because it is noisy. I will have maybe 500 times as many pts as I require! That makes for large data files, for no good reason. - the representations for x and z are about 99% zeros. Again, wasteful for storage - the analysis will be awkward and slow because of the huge number of pts If I sample at a rate of every 1/2 sec, z may be not detected at all, and the edges of x are represented very poorly. I could just save y as a separate file, with values sampled every 1/2 sec, save x as a file containing onset and offset times, and save z as the times of each impulse. All three files have different lengths. If I save this way it is awkward keeping track of the three files for each run, and I'm still left with the problem of analysing them. In my mind they are actually three continuous waveforms, and my simple-minded way of analysing would be to create digitised waveform versions of all three (on some fine grain). If I do that I still have the problem of zillions of useless points clogging RAM. What do I want from the analysis? I want the average waveform for y that occurs for 30 sec after each x pulse, and the average waveform for z that occurs for 30 sec after each x pulse. The average waveform for y surrounding each z (maybe 30 sec before and 10 sec after) would also be useful. Thanks very much for any suggestions about representation for data collection and analysis, and for data analysis methods (which will depend on the data representation). Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Putting names on a ggplot - fortune candidate?
On 10/21/2009 01:59 AM, hadley wickham wrote: It currently works (because I can't figure out how to make it an error) but you really should not do it. Please pardon my nomination, Hadley, but that is just too good to pass up. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear regression: Is there a way to get the results of lm as variables
Hi R users I used R to get the results of a linear regression reg-lm(y~x) here are the results: # Call: # lm(formula = donnees$txi7098 ~ donnees$txs7098) # # Residuals: # Min1QMedian3Q Max # -0.037971 -0.013373 -0.004947 0.010618 0.053235 # # Coefficients: # Estimate Std. Error t value Pr(|t|) # (Intercept) 0.085470.03028 2.822 0.011738 * # donnees$txs7098 0.623060.12847 4.850 0.000150 *** # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # # Residual standard error: 0.02199 on 17 degrees of freedom # Multiple R-squared: 0.5805, Adjusted R-squared: 0.5558 # F-statistic: 23.52 on 1 and 17 DF, p-value: 0.0001502 I know how to get the coefficients as variable for exemple: reg$coefficients[1]=0.08547 (Estimate ) reg$coefficients[2]=0.62306 (Estimate ) My question is: Is there a way to get the other results of lm as variables? -Std. Errors? -t value? -Pr(|t|) ? -Residual standard error? -Multiple R-squared? -F-statistic? Best regards -- View this message in context: http://www.nabble.com/linear-regression%3A-Is-there-a-way-to-get-the-results-of-lm-as-variables-tp25988916p25988916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transparent Bands in R
On 10/21/2009 09:40 AM, mnstn wrote: Hello All, My question is regarding the attached plot. I would like to have multiple transparent green bands running the length (yaxis) of the plot the width of which is determined by the green lines at y=0 in the plot. Can you suggest a way to do it? Hi mnstn, Transparency is only supported by some graphic devices. It may be simpler to plot the values, use rect to draw green rectangles: xylim-par(usr) rect(start1,xylim[3],stop2,xylim[4],col=green) rect(start2,... then redraw the lines over the rectangles using lines. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] three related time series with different resolutions
PS I think one way to get the average waveforms I want from the analysis is using cross-correlation, but again the multiple scale problem is present. On Wed, Oct 21, 2009 at 9:56 AM, William Simpson william.a.simp...@gmail.com wrote: I have three time series, x, y, and z, and I want to analyse the relations between them. However, they have vastly different resolutions. I am writing to ask for advice on how to handle this situation in R. x is a stimulus, and y and z are responses. x is a rectangular pulse 4 sec long. Its onset and offset are known with sub-millisecond precision. The onset varies irregularly -- it doesn't fall on neat 1/2 sec or sec boundaries for example. y is a sampled continuous waveform. It is highly noisy, and it is actually well represented by samples 1/2 sec apart or so. z is a very short pulse -- perhaps 5 ms long -- occurring at irregular times. I have problems with how to represent these waveforms at the input stage (during data collection) and at the analysis stage. If I want to represent x and z with the sort of precision I'd like, I'd have to sample every ms or so. But: - that is massive overkill for y, because it is noisy. I will have maybe 500 times as many pts as I require! That makes for large data files, for no good reason. - the representations for x and z are about 99% zeros. Again, wasteful for storage - the analysis will be awkward and slow because of the huge number of pts If I sample at a rate of every 1/2 sec, z may be not detected at all, and the edges of x are represented very poorly. I could just save y as a separate file, with values sampled every 1/2 sec, save x as a file containing onset and offset times, and save z as the times of each impulse. All three files have different lengths. If I save this way it is awkward keeping track of the three files for each run, and I'm still left with the problem of analysing them. In my mind they are actually three continuous waveforms, and my simple-minded way of analysing would be to create digitised waveform versions of all three (on some fine grain). If I do that I still have the problem of zillions of useless points clogging RAM. What do I want from the analysis? I want the average waveform for y that occurs for 30 sec after each x pulse, and the average waveform for z that occurs for 30 sec after each x pulse. The average waveform for y surrounding each z (maybe 30 sec before and 10 sec after) would also be useful. Thanks very much for any suggestions about representation for data collection and analysis, and for data analysis methods (which will depend on the data representation). Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bootstrapping confidence intervals
Hello, We are a group of PhD students working in the field of toxicology. Several of us have small data sets with N=10-15. Our research is mainly about the association between an exposure and an effect, so preferrably we would like to use linear regression models. However, most of the time our data do not fulfill the model assumptions for linear models ( no normality of y-varible achieved even after log transformation). We have been told that we can use bootstrapping to derive a confidence interval for the original parameter estimate ( Beta 1) from the linear regression model and if the confidence interval do not include 0, we can trust the result from the original linear model ( of couse only if a scatter plot of the variables looks ok). What is your opinion about this method? Is that ok? I have problems understanding how it is possible to resample several times from an already poor distribution ( that do not fulfill the model assumptions for linear models) to achieve a confidence interval that validates the use of these linear models? I would really appriciate a simple explanation about this! Many thanks, Charlotta Rylander [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: Histogram with negative values on x-axis doesn't work
I have a dataset that contains numbers between -10 and 0. E.g. x = c(-9.23, -9.56, -1.40, ...) If I no do a qplot(x, geom=histogram) I get the error: Error: position_stack requires non-overlapping x intervals Strangely, the following both work: qplot(x * -1, geom=histogram) qplot(x+100, geom=histogram) Has anyone else encountered this? Is this a bug in ggplot2, or do I get anything completely wrong here? Versions: R: 2.9.1, ggplot2: 0.8.3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with pdf in batch mode
Hi all R Users, I am using R in batch mode to do an automatic reporting, I'm saving all the picture in wmf. When I'm launching manually the batch file(.bat) it's working but when I'm lauching the batch file from the server I have in the outputfile the following erro message: Error in device.call(...) : unable to start device pdf Calls: total ... trellis.par.get - trellis.device - device.call - .External Execution halted I know that by default in batch mode it's printing the graphs in a pdf file and I think the error is coming from that but I don't know how to disable this option. Thanks Guillaume Le Ray Delta, SenseLab [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eliminate characters of one data.frame col, using another column
Dear Maling list, I have a data.frame with three columns. I want to produce a fourth column which is result of eliminating the characters present in the second and third colum from the first. Example: a b c 1 f f 2 j h 3 gg I want the result: ff hhh I can get what I want using the code below. But it's slow for big files (which I have) and most likely there's a better way to do this 1) is there a function that would do the same as FUN.remove2ndCol 2) is there a way to avoid the apply for every row? Thanks, Tiago ### dfIn - data.frame(a=c('', '', ''), b=c('f', 'j', 'g'), c=c('f', 'h', 'g')) FUN.remove2ndCol - function(vec){ vec.sub - sub(vec['b'], '', vec['a']) vec.sub - sub(vec['c'], '', vec.sub) return(vec.sub) } dfIn$Output - apply(dfIn, 1, FUN.remove2ndCol) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Keeping package sources for recompilation with new R version?
Hi I am using Ubuntu Hardy, and I installing many packages from source. I am keeping my R packages fairly up to date. My question is: is there a way, of keeping the source packages, so that when I am installing a new version of R, an update.packages(checkBuilt=TRUE) will only fetch the packages when newer ones are available, but otherwise will use the local copies? To rephrase it, is it possible to have some kind of local repository for the source packages which are installed, so that only packages are downloaded when newer ones are available? I could use destdir to specify the directory in which to save the source files of the packages, but how can I use this directory as a first choice, and only download if a newer version exists? Rainer -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Cell: +27 - (0)83 9479 042 Fax:+27 - (0)86 516 2782 Fax:+49 - (0)721 151 334 888 email: rai...@krugs.de Skype: RMkrug Google: r.m.k...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with time series
Hi there, I am having trouble getting the plotting of multiple time series to work. I have used RBloomberg to download data, which I then convert to a data frame. After I have calculated my new index values, I would like to plot the new index. My problem is that I can't get the plot feature to show the dates properly. The resultant data frame DATA has row.names in the first column, which contains the dates from the RBloomberg download. Could one of the R wizards please point me in the right direction? Thanks in advance Summary of my code: conn - blpConnect(iface=COM, timeout=12000, show.days=week,na.action=previous.days, periodicity=daily) data - blpGetData(conn, tickers, PX_LAST, start = chron(01/01/00), end = NULL,barsize = NULL, barfields = NULL, retval = NULL) blpDisconnect(conn) DATA - data.frame(data) . . . A few calculations . . . Strategy - as.ts(DATA[,5]) Index - as.ts(DATA[,6]) ts.plot(Strategy,Index,col=c(tomato,black)) -- View this message in context: http://www.nabble.com/Help-with-time-series-tp25990385p25990385.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
Dear all, Lets say I have the following data frame: set.seed(1) col1 - c(rep('happy',9), rep('sad', 9)) col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) score=rnorm(18, 10, 3) df1-data.frame(col1=col1, col2=col2, Date=dates, score=score) col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? Thanks in advance, Tony O/S: Windows Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: Histogram with negative values on x-axis doesn't work
In article OF1D427087.01614A71-ONC1257656.003BFAB8-C1257656.003C88F8 @basf-c-s.be, sebastian.roh...@basf.com says... I have a dataset that contains numbers between -10 and 0. E.g. x = c(-9.23, -9.56, -1.40, ...) If I no do a qplot(x, geom=histogram) I get the error: Error: position_stack requires non-overlapping x intervals Strangely, the following both work: qplot(x * -1, geom=histogram) qplot(x+100, geom=histogram) I can reproduce it with for example x=c(-9.23, -9.56, -1.40) But adding a single positive number, even .001, fixes it, while adding a similar negative number introduces a new error message, so it really looks like a bug in ggplot2 when all the values are negative. Report it to the ggplot2 author or the ggplot2 mailing list. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) b On Oct 21, 2009, at 9:03 AM, Tony Breyal wrote: Dear all, Lets say I have the following data frame: set.seed(1) col1 - c(rep('happy',9), rep('sad', 9)) col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) score=rnorm(18, 10, 3) df1-data.frame(col1=col1, col2=col2, Date=dates, score=score) col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? Thanks in advance, Tony O/S: Windows Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
On 10/21/2009 7:03 AM, Tony Breyal wrote: Dear all, Lets say I have the following data frame: set.seed(1) col1 - c(rep('happy',9), rep('sad', 9)) col2 - rep(c(rep('alpha', 3), rep('beta', 3), rep('gamma', 3)),2) dates - as.Date(rep(c('2009-10-13', '2009-10-14', '2009-10-15'),6)) score=rnorm(18, 10, 3) df1-data.frame(col1=col1, col2=col2, Date=dates, score=score) col1 col2 Date score 1 happy alpha 2009-10-13 8.120639 2 happy alpha 2009-10-14 10.550930 3 happy alpha 2009-10-15 7.493114 4 happy beta 2009-10-13 14.785842 5 happy beta 2009-10-14 10.988523 6 happy beta 2009-10-15 7.538595 7 happy gamma 2009-10-13 11.462287 8 happy gamma 2009-10-14 12.214974 9 happy gamma 2009-10-15 11.727344 10 sad alpha 2009-10-13 9.083835 11 sad alpha 2009-10-14 14.535344 12 sad alpha 2009-10-15 11.169530 13 sad beta 2009-10-13 8.136278 14 sad beta 2009-10-14 3.355900 15 sad beta 2009-10-15 13.374793 16 sad gamma 2009-10-13 9.865199 17 sad gamma 2009-10-14 9.951429 18 sad gamma 2009-10-15 12.831509 Is it possible to get the following, whereby I am averaging the values within each group of values in col2: col1 col2 Date score Average 1 happy alpha 13/10/2009 8.120639 8.721561 2 happy alpha 14/10/2009 10.550930 8.721561 3 happy alpha 15/10/2009 7.493114 8.721561 4 happy beta 13/10/2009 14.785842 11.104320 5 happy beta 14/10/2009 10.988523 11.104320 6 happy beta 15/10/2009 7.538595 11.104320 7 happy gamma 13/10/2009 11.462287 11.801535 8 happy gamma 14/10/2009 12.214974 11.801535 9 happy gamma 15/10/2009 11.727344 11.801535 10 sad alpha 13/10/2009 9.083835 11.596236 11 sad alpha 14/10/2009 14.535344 11.596236 12 sad alpha 15/10/2009 11.169530 11.596236 13 sad beta 13/10/2009 8.136278 8.288990 14 sad beta 14/10/2009 3.355900 8.288990 15 sad beta 15/10/2009 13.374793 8.288990 16 sad gamma 13/10/2009 9.865199 10.882712 17 sad gamma 14/10/2009 9.951429 10.882712 18 sad gamma 15/10/2009 12.831509 10.882712 My feeling is that I should be using the ?aggregate is some fashion but can't see how to do it. Or possibly there's another function i should be using? ?ave For example, try something like this: transform(df1, Average = ave(score, col1, col2)) Thanks in advance, Tony O/S: Windows Vista Ultimate sessionInfo() R version 2.9.2 (2009-08-24) i386-pc-mingw32 locale: LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom. 1252;LC_MONETARY=English_United Kingdom. 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu, bcarv...@jhsph.edu says... aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) Or, with the 'plyr' package, which has a very nice syntax: library(plyr) ddply(df1, .(col1, col2), transform, Average=mean(score)) It may be a bit slow for very large datasets, though. Here's an alternative, which will be as fast as the aggregate solution. within(df1, { Average=ave(score, col1, col2, FUN=mean) } ) Which one you use is a matter of taste. And of course, the 'within' function is not the important part here; 'ave' is. For example, if you have attached your data frame, you just have to type Average=ave(score, col1, col2, FUN=mean) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrapping confidence intervals
John Fox has a nice explanation here: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf -Ista On Wed, Oct 21, 2009 at 6:38 AM, Charlotta Rylander z...@nilu.no wrote: Hello, We are a group of PhD students working in the field of toxicology. Several of us have small data sets with N=10-15. Our research is mainly about the association between an exposure and an effect, so preferrably we would like to use linear regression models. However, most of the time our data do not fulfill the model assumptions for linear models ( no normality of y-varible achieved even after log transformation). We have been told that we can use bootstrapping to derive a confidence interval for the original parameter estimate ( Beta 1) from the linear regression model and if the confidence interval do not include 0, we can trust the result from the original linear model ( of couse only if a scatter plot of the variables looks ok). What is your opinion about this method? Is that ok? I have problems understanding how it is possible to resample several times from an already poor distribution ( that do not fulfill the model assumptions for linear models) to achieve a confidence interval that validates the use of these linear models? I would really appriciate a simple explanation about this! Many thanks, Charlotta Rylander [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SVM probability output variation
Dear R:ers, I'm using the svm from the e1071 package to train a model with the option probabilities = TRUE. I then use predict with probabilities = TRUE and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: probabilities Grp.0Grp.1 0.70771550.2922845 0.79387820.2061218 0.81788330.1821167 0.71222030.2877797 How can the predictions using the same training and test data vary so much? Thanks, Anders __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combining multiple 3D graphs
Hi all! I am a grad stat student and is fairly new in using R. I am doing a regression tree in one of my problem sets. I have already identified the cut-points (using Stata) for the regression. My problem is how to graph the fitted values of z against the independent variables x and y. Basically, I want to graph the following: z=87.67if 0x=29 0y=48 z=75if 0x=29 48y=62 z=67if 29x=36 0y=62 z=56.71if 36x=45 0y=62 z=40.8 if 36x=55 0y=62 I have some idea (still to be implemented) on how to graph them piece-wise using persp. My question is this:how do one combine multiple 3D graphs (using persp) in R? better yet, is there a way to graph them all in just one go? Any help would be welcome. Many thanks in advance. Cheers, Mykel -- I am most anxious for liberties for our country... but I place as a prior condition the education of the people so that our country may have an individuality of its own and make itself worthy of liberties... Jose Rizal,1896 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] re ferring to data of previous rows
Iff dd is your data frame then: dd$prev - ave(dd$of, dd$Id, FUN = function(x) c(NA, head(x, -1))) On Wed, Oct 21, 2009 at 2:55 AM, clion birt...@hotmail.com wrote: Dear Rlers, in the following dataset I would like to insert a new column that refers to the data of the previous row. My question is whether the probability of a female (Id) changes if she had given birth to a pup in the previous year. So my dataframe consists of the column Id, year (2003-2007 for each Id) and offspring (=of; 1-0): Id year of 1 2003 1 1 2004 0 1 2005 1 1 2006 1 1 2007 0 with 1= female (Id) gave birth to offspring (same year) 2 2003 0 and 0 = female didn't pup 2 2004 1 2 2005 1 2 2006 0 2 2007 1 3 2003 1 3 2004 1 3 2005 1 3 2006 0 3 2007 0 Now I want to add the column offspring in the previous year - yes/no(1-0) the output now should look like this: Id year of of_inPreviousYear 1 2003 1 NA 1 2004 0 1 1 2005 1 0 1 2006 1 1 1 2007 0 1 2 2003 0 NA 2 2004 1 0 2 2005 1 1 2 2006 0 1 2 2007 1 0 3 2003 1 NA 3 2004 1 1 3 2005 1 1 3 2006 0 1 3 2007 0 0 any idea how I could program that? thanks a lot, clion -- View this message in context: http://www.nabble.com/referring-to-data-of-previous-rows-tp25987364p25987364.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] slope calculation
Hi MS, I think it is simple trigonometry: atan(beta) or in degrees instead of radians atan(beta)*360/(2*pi) hth. ms.com schrieb: Dear all I am new R user, and trying to learn more. I am doing linear regression analysis in R with my data. I am trying to find the way to calculate the slope value (coefficient of x) to degree of slope. Please give me idea on this. Thanking you in anticipation Warm regardMS _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf Körperschaft des öffentlichen Rechts Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender) Dr. Alexander Kirstein Ricarda Klein Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] three related time series with different resolutions
I wasn't clear: x and z are pulse *trains* with irregular gaps between pulses. x is a rectangular pulse 4 sec long. Its onset and offset are known with sub-millisecond precision. The onset varies irregularly -- it doesn't fall on neat 1/2 sec or sec boundaries for example. y is a sampled continuous waveform. It is highly noisy, and it is actually well represented by samples 1/2 sec apart or so. z is a very short pulse -- perhaps 5 ms long -- occurring at irregular times. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem using the source-function within R-functions
On 10/21/2009 3:53 AM, Johan Lassen wrote: Hi Giovanni, Thanks for your reply. I can make the function work after parsing the code directly into R. The problem arise after compiling the function into a package and then calling the function, because the files inside source() seems to be missing. I tried to include the sourced files in the argument code_files of the function package.skeleton. The files are brought correctly to the package but when running the generated package then the files produce an error due to variables that are not defined. There may be no way around other than copying the content of the sourced files into the file where the function is defined? - in this way the definition of the function is all written in one file, but the organization of the code-sections is worse than when using the source()-function... R is not a macro language, and doesn't work well as one (though Thomas Lumley has written some code to make it act that way; see R-news http://www.r-project.org/doc/Rnews/Rnews_2001-3.pdf). I don't know the details of what you are trying to do, but the best way to handle the generic sort of problem you are describing is to take those source'd files, and rewrite their content as functions to be called from your other functions. Then their source can still be in separate files, with one line in each location where you need to call it. The difficulty in doing this is that you need to think carefully about what arguments those functions should have, and that may need some reorganization. (Macro languages let you refer to every local variable; called functions can only see what you pass them.) However, in general this will make your code easier to read and maintain, so the extra work is a good investment. Duncan Murdoch Best regards, Johan 2009/10/20 Giovanni Petris gpet...@uark.edu The problem probably lies in the source-ing part: look at getwd() setwd() HTH, Giovanni Date: Tue, 20 Oct 2009 13:00:02 +0200 From: Johan Lassen jle...@gmail.com Sender: r-help-boun...@r-project.org Precedence: list DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; --===0554064772== Content-Type: text/plain Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 1477 Dear R community, You may have the solution to how to construct a function using the function source() to build the function; i.e. myfunction - function(...){ source('file1.r') source('file2.r') } After compiling and installing the myfunction in R, then calling the myfunction gives an error because the content of 'file1.r' and 'file2.r' seems to be missing. Anyone has the trick to overcome this problem? Thanks in advance! best wishes, Johan PS: My function is: run_accumm_value - function(ind_noder_0, ind_loc_val,ind_retention,downstream){ ## Preprocessing of looping calculations: koersel_uden_ret - length(unique(ind_noder_0$oplid)) opsaml_b_0_2 - numeric(koersel_uden_ret) opsaml_b_0_2_1 - numeric(koersel_uden_ret) opsaml_b_0_2_2 - seq(1:koersel_uden_ret) ## Preprocessing of topology and local values to be summed: source('preproces_topology.r', local =3D T) source('preproces_loc_val.r', local =3D T) # Loop for each grouping factor (column in ind_noder_0: oplid): for(j in 1:koersel_uden_ret){ source('matrix_0.r', local =3D T) source('matrix.r', local =3D T) source('local_value.r', local =3D T) source('fordeling.r', local =3D T) source('fordeling_manuel.r', local =3D T) source('local_ret.r', local =3D T) source('Ax=3Db.r', local =3D T) source('opsamling_x_0_acc.r', local =3D T) } source('opsamling_b_1.r', local =3D T) opsaml_b_2 } --=20 Johan Lassen Environment Center Nyk=F8bing F Denmark [[alternative HTML version deleted]] --===0554064772== Content-Type: text/plain; charset=us-ascii MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --===0554064772==-- -- Giovanni Petris gpet...@uark.edu Associate Professor Department of Mathematical Sciences University of Arkansas - Fayetteville, AR 72701 Ph: (479) 575-6324, 575-8630 (fax) http://definetti.uark.edu/~gpetris/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear regression: Is there a way to get the results of lm as variables
Hi CE.KA, Take a look at the following: # Data set.seed(123) x - rnorm(100) y - 2 + 1.5*x + rnorm(100) # Regression model reg - lm(y ~ x) # The summary summary(reg) # Objects present in the summary() names(summary(reg)) # Extracting the coefficients summary(reg)$coeff HTH, Jorge On Wed, Oct 21, 2009 at 5:07 AM, CE.KA wrote: Hi R users I used R to get the results of a linear regression reg-lm(y~x) here are the results: # Call: # lm(formula = donnees$txi7098 ~ donnees$txs7098) # # Residuals: # Min1QMedian3Q Max # -0.037971 -0.013373 -0.004947 0.010618 0.053235 # # Coefficients: # Estimate Std. Error t value Pr(|t|) # (Intercept) 0.085470.03028 2.822 0.011738 * # donnees$txs7098 0.623060.12847 4.850 0.000150 *** # --- # Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 # # Residual standard error: 0.02199 on 17 degrees of freedom # Multiple R-squared: 0.5805, Adjusted R-squared: 0.5558 # F-statistic: 23.52 on 1 and 17 DF, p-value: 0.0001502 I know how to get the coefficients as variable for exemple: reg$coefficients[1]=0.08547 (Estimate ) reg$coefficients[2]=0.62306 (Estimate ) My question is: Is there a way to get the other results of lm as variables? -Std. Errors? -t value? -Pr(|t|) ? -Residual standard error? -Multiple R-squared? -F-statistic? Best regards -- View this message in context: http://www.nabble.com/linear-regression%3A-Is-there-a-way-to-get-the-results-of-lm-as-variables-tp25988916p25988916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to average subgroups in a dataframe? (not sure how to apply aggregate(..))
Thank you all for your responses, i have now achieved the desired output for my own real data using your suggestions. I will also have to look into this 'plyr' package as i have noticed that it gets mentioned a lot. On 21 Oct, 13:33, Karl Ove Hufthammer k...@huftis.org wrote: In article 800acfc0-2c3c-41f1-af18-3b52f7e43...@jhsph.edu, bcarv...@jhsph.edu says... aves = aggregate(df1$score, by=list(col1=df1$col1, col2=df1$col2), mean) results = merge(df1, aves) Or, with the 'plyr' package, which has a very nice syntax: library(plyr) ddply(df1, .(col1, col2), transform, Average=mean(score)) It may be a bit slow for very large datasets, though. Here's an alternative, which will be as fast as the aggregate solution. within(df1, { Average=ave(score, col1, col2, FUN=mean) } ) Which one you use is a matter of taste. And of course, the 'within' function is not the important part here; 'ave' is. For example, if you have attached your data frame, you just have to type Average=ave(score, col1, col2, FUN=mean) -- Karl Ove Hufthammer __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combining multiple 3D graphs
On 10/21/2009 9:03 AM, Michael Ralph M. Abrigo wrote: Hi all! I am a grad stat student and is fairly new in using R. I am doing a regression tree in one of my problem sets. I have already identified the cut-points (using Stata) for the regression. My problem is how to graph the fitted values of z against the independent variables x and y. Basically, I want to graph the following: z=87.67if 0x=29 0y=48 z=75if 0x=29 48y=62 z=67if 29x=36 0y=62 z=56.71if 36x=45 0y=62 z=40.8 if 36x=55 0y=62 I have some idea (still to be implemented) on how to graph them piece-wise using persp. My question is this:how do one combine multiple 3D graphs (using persp) in R? better yet, is there a way to graph them all in just one go? I would define a function of x and y which evaluates to your fitted values, evaluate it on a grid, and then use persp. You might already have code to do this, or you might have to write it something like this: f - function(x,y) { ifelse( 0 x x = 29 0 y y = 48, 87.67, ifelse( 0 x x = 29 y 48 y = 62, 75, } etc. I leave the difficult problem of properly closing all the parentheses to you! Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting/modifying a symbolic formula
Hello All.. Please consider the following: y - rnorm(20, mean = 10) f1 - as.factor(rep(c(A, B, B, A), 5)) f2 - as.factor(rep(c(C, D), 10)) testdata - data.frame(y, f1, f2) testFunc - function(formula, data, ...) { #mf - model.frame(formula, data) kw.res - kruskal.test(formula, data) } res - testFunc(y ~ f1 * f2, testdata) Which works perfectly well. Now, I would like to do some other tests which allow only one factor, for instance wilcox.test. So, I would like to modify my formula so that it reads y ~ f1 or y ~ f2, then conduct more tests. What is the smart way to subset or modify an existing symbolic formula? I have been staring at the attributes of res and mf (commented out currently) but these are nested in a way I find difficult to extract. ?terms, ?terms.formula, ?formula etc all discuss these attributes but as I said, it seems a bit impenetrable. RSiteSearch(symbolic formula) returns too many answers. No doubt I am missing the obvious, as this is my first foray into using formulas in my own functions. TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.10.0 RC (2009-10-19 r50172) i386-apple-darwin8.11.1 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid utils stats graphics grDevices methods [9] base other attached packages: [1] ggplot2_0.8.3 reshape_0.8.3 proto_0.3-8mvbutils_2.2.0 [5] ChemoSpec_1.2 lattice_0.17-25mvoutlier_1.4 plyr_0.1.8 [9] RColorBrewer_1.0-2 chemometrics_0.4 som_0.3-4 robustbase_0.4-5 [13] rpart_3.1-45 pls_2.1-0 pcaPP_1.7 mvtnorm_0.9-7 [17] nnet_7.2-48mclust_3.2 MASS_7.2-48lars_0.9-7 [21] e1071_1.5-19 class_7.2-48 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] formula and model.frame
Suppose I have the following function myFun - function(formula, data){ f - formula(formula) dat - model.frame(f, data) dat } Applying it with this sample data yields a new dataframe: qqq - data.frame(grade = c(3, NA, 3,4,5,5,4,3), score = rnorm(8), idVar = c(1:8)) dat - myFun(score ~ grade, qqq) However, what I would like is for the resulting dataframe (dat) to include as a column idVar. Naively, I could do dat - myFun(score ~ grade + idVar, qqq) This gives what I'm after in terms of the resulting data, but doesn't make sense within the context of the model I am working on since idVar is not one of the conditioning variables used, it has a different purpose altogether. So, I was thinking another way is to attach it somehow afterwards. Something like: myFun - function(formula, data, id, na.action){ f - formula(formula) idVar - data[, id] dat - model.frame(f, data, na.action = na.action) dat[, id] - idVar dat } myFun(score ~ grade, qqq, id = 'idVar', na.action = NULL) Of course, I intentionally use na.action = NULL here because the following occurs, of course myFun(score ~ grade, qqq, id = 'idVar', na.action = na.omit) Error in `[-.data.frame`(`*tmp*`, , id, value = 1:8) : replacement has 8 rows, data has 7 I see a few workarounds, but I am certain there is a cleaner way to accomplish this. Harold sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] lme4_0.999375-28 Matrix_0.999375-25 lattice_0.17-22xtable_1.5-5 adapt_1.0-4MiscPsycho_1.4 [7] statmod_1.3.8 loaded via a namespace (and not attached): [1] grid_2.9.0 tools_2.9.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting/modifying a symbolic formula
Here is one way: fo - y ~ f1 * f2 one.x - lapply(all.vars(fo[[3]]), function(x) { fo[[3]] - as.name(x); fo }) one.x [[1]] y ~ f1 [[2]] y ~ f2 On Wed, Oct 21, 2009 at 11:29 AM, Bryan Hanson han...@depauw.edu wrote: Hello All.. Please consider the following: y - rnorm(20, mean = 10) f1 - as.factor(rep(c(A, B, B, A), 5)) f2 - as.factor(rep(c(C, D), 10)) testdata - data.frame(y, f1, f2) testFunc - function(formula, data, ...) { # mf - model.frame(formula, data) kw.res - kruskal.test(formula, data) } res - testFunc(y ~ f1 * f2, testdata) Which works perfectly well. Now, I would like to do some other tests which allow only one factor, for instance wilcox.test. So, I would like to modify my formula so that it reads y ~ f1 or y ~ f2, then conduct more tests. What is the smart way to subset or modify an existing symbolic formula? I have been staring at the attributes of res and mf (commented out currently) but these are nested in a way I find difficult to extract. ?terms, ?terms.formula, ?formula etc all discuss these attributes but as I said, it seems a bit impenetrable. RSiteSearch(symbolic formula) returns too many answers. No doubt I am missing the obvious, as this is my first foray into using formulas in my own functions. TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.10.0 RC (2009-10-19 r50172) i386-apple-darwin8.11.1 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid utils stats graphics grDevices methods [9] base other attached packages: [1] ggplot2_0.8.3 reshape_0.8.3 proto_0.3-8 mvbutils_2.2.0 [5] ChemoSpec_1.2 lattice_0.17-25 mvoutlier_1.4 plyr_0.1.8 [9] RColorBrewer_1.0-2 chemometrics_0.4 som_0.3-4 robustbase_0.4-5 [13] rpart_3.1-45 pls_2.1-0 pcaPP_1.7 mvtnorm_0.9-7 [17] nnet_7.2-48 mclust_3.2 MASS_7.2-48 lars_0.9-7 [21] e1071_1.5-19 class_7.2-48 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Temperature Prediction Model
Greetings! As part of my research project I am using R to study temperature data collected by a network. Each node (observation point) records temperature of its surroundings throughout the day and generates a dataset. Using the recorded datasets for the past 7 days I need to build a prediction model for each node that would enable it to check the observed data against the predicted data. How can I derive an equation for temperature using the datasets? The following is a subset of one of the datasets:- Time Temperature 07:00:17.369668 17.509 07:03:17.465725 17.509 07:04:17.597071 17.509 07:05:17.330544 17.509 07:10:47.838123 17.5482 07:14:16.680696 17.5874 07:16:46.67457 17.5972 07:29:16.887654 17.7442 07:29:46.705759 17.754 07:32:17.131713 17.7932 07:35:47.113953 17.8324 07:36:17.194981 17.8324 07:37:17.227013 17.852 07:38:17.809174 17.8618 07:38:48.00011 17.852 07:39:17.124362 17.8618 07:41:17.130624 17.8912 07:41:46.966421 17.901 07:43:47.524823 17.95 07:44:47.430977 17.95 07:45:16.813396 17.95 So far I have tried to use linear model fit but have not found it to be useful. You may look at the attached graph for further reference. http://www.nabble.com/file/p25995874/Regression%2BModel%2Bfor%2BNode%2B1%2BDay%2B1.png Regression+Model+for+Node+1+Day+1.png I would really appreciate if you could suggest the correct approach to building such a prediction model. Many thanks, Aneeta -- View this message in context: http://www.nabble.com/Temperature-Prediction-Model-tp25995874p25995874.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Hi Anders, On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote: Dear R:ers, I'm using the svm from the e1071 package to train a model with the option probabilities = TRUE. I then use predict with probabilities = TRUE and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: I'm not sure I'm following the example your giving and the scenario you are describing. probabilities Grp.0Grp.1 0.70771550.2922845 0.79387820.2061218 0.81788330.1821167 0.71222030.2877797 This seems fine to me: it looks like the probabilities of class membership for 4 examples (Note that Grp.0 + Grp.1 = 1). How can the predictions using the same training and test data vary so much? I'm trying the code below several times (taken from the example), and the probabilities calculated from the call to prediction don't change much at all: R data(iris) R attach(iris) R model - svm(x, y, probability=TRUE) R predict(model, x, probability=TRUE) To be fair, the probabilities aren't exactly the same, but the difference between two runs is really small: R model - svm(x, y, probability=TRUE) R a - predict(model, x, probability=TRUE) R model - svm(x, y, probability=TRUE) R b - predict(model, x, probability=TRUE) R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities'))) [1] 0.003215959 Is this what you were talking about, or ... ? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] date conversion not as i would have expected
Good day, i imported some data into R from Excel. By using the edit() function, this is what one of the dates looks like in R: x - structure(1254351600, class = c(POSIXt, POSIXct), tzone = ) [1] 2009-10-01 BST However, when i do the following, the date changes: as.Date(x, formate=%Y-%m-%d ) [1] 2009-09-30 I don't understand why this is happening. I realise that i can get around this by doing as.Date(as.character(x)), but would be nice to understand why it doesn't work directly. C xx __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] squared euclidean distance
Dear R-Help-Team, I would like to cluster my data using the ward-method. In several papers I read (e.g. Bahrenberg) that it is neccesary to use the squared euclidean distance with the ward-method. Unfortunatelly I cannot find this term in r as a method for measuring the distance. Does anybody have an idea? Thanks in advance, Carolin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transparent Bands in R
Hello Megha and Jim, Thanks for your comments. The green bands actually correspond to data that looks like: 13 0 42 0 183 0 186 0 187 0 192 0 194 0 and so on. I plotted them using: plot(v[,1],3+v[,2],type=h,col=gray,lwd=5) and the rest of the data using points. The result is here: http://www.twitpic.com/md2li Thanks! Jim Lemon-2 wrote: On 10/21/2009 09:40 AM, mnstn wrote: Hello All, My question is regarding the attached plot. I would like to have multiple transparent green bands running the length (yaxis) of the plot the width of which is determined by the green lines at y=0 in the plot. Can you suggest a way to do it? Hi mnstn, Transparency is only supported by some graphic devices. It may be simpler to plot the values, use rect to draw green rectangles: xylim-par(usr) rect(start1,xylim[3],stop2,xylim[4],col=green) rect(start2,... then redraw the lines over the rectangles using lines. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Transparent-Bands-in-R-tp25983169p25995374.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Temperature Prediction Model
Hi, On Oct 21, 2009, at 12:31 PM, Aneeta wrote: Greetings! As part of my research project I am using R to study temperature data collected by a network. Each node (observation point) records temperature of its surroundings throughout the day and generates a dataset. Using the recorded datasets for the past 7 days I need to build a prediction model for each node that would enable it to check the observed data against the predicted data. How can I derive an equation for temperature using the datasets? The following is a subset of one of the datasets:- Time Temperature 07:00:17.369668 17.509 07:03:17.465725 17.509 07:04:17.597071 17.509 07:05:17.330544 17.509 07:10:47.838123 17.5482 07:14:16.680696 17.5874 07:16:46.67457 17.5972 07:29:16.887654 17.7442 07:29:46.705759 17.754 07:32:17.131713 17.7932 07:35:47.113953 17.8324 07:36:17.194981 17.8324 07:37:17.227013 17.852 07:38:17.809174 17.8618 07:38:48.00011 17.852 07:39:17.124362 17.8618 07:41:17.130624 17.8912 07:41:46.966421 17.901 07:43:47.524823 17.95 07:44:47.430977 17.95 07:45:16.813396 17.95 I think you/we need much more information. Are you really trying to build a model that predicts the temperature just given the time of day? Given that you're in NY, I'd say 12pm in August sure feels much different than 12pm in February, no? Or are you trying to predict what one sensor readout would be at a particular time given readings from other sensors at the same time? Or ... ? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Hi again, and thank you Steve for your reply! Hi Anders, On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote: Dear R:ers, I'm using the svm from the e1071 package to train a model with the option probabilities = TRUE. I then use predict with probabilities = TRUE and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: I'm not sure I'm following the example your giving and the scenario you are describing. I think you got it! probabilities Grp.0Grp.1 0.70771550.2922845 0.79387820.2061218 0.81788330.1821167 0.71222030.2877797 This seems fine to me: it looks like the probabilities of class membership for 4 examples (Note that Grp.0 + Grp.1 = 1). Yes, within each run all was OK, but I was surprised that it varied to such a high degree. How can the predictions using the same training and test data vary so much? I'm trying the code below several times (taken from the example), and the probabilities calculated from the call to prediction don't change much at all: R data(iris) R attach(iris) R model - svm(x, y, probability=TRUE) R predict(model, x, probability=TRUE) To be fair, the probabilities aren't exactly the same, but the difference between two runs is really small: R model - svm(x, y, probability=TRUE) R a - predict(model, x, probability=TRUE) R model - svm(x, y, probability=TRUE) R b - predict(model, x, probability=TRUE) R mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities'))) [1] 0.003215959 Is this what you were talking about, or ... ? Yes, exactly that. In your example, though, the variation seems to be a lot smaller. I'm guessing that has to with the data. If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model. I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)... If important, I use a linear kernel and don't tune the model in any way. Thank's again! /Anders -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on mixed effect models with LME
Good afternoon Using R 2.9.2 on a machine running Windows XP I have a longitudinal data set, with data on schools and their test scores over a four year period. I have centered year, and run the following m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum, na.action = na.omit) where math_1 is a percentage of students in a given school that are at the lowest math achievement level, year is year, TFC_ is a categorical variable for a treatment I wish to evaluate, and schoolnum is an identifier. When I run summary on this model, I get a strong negative correlation (-0.91) between the intercept and I(year-2007.5), despite the fact that the mean of year is 2007.5. I am puzzled, as I thought centering the time variable should eliminate, or at least strongly reduce, this correlation. Any insights appreciated thanks Peter Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to find the interception point of two linear fitted model in R?
Dear All, Let have 10 pair of observations, as shown below. ## x - 1:10 y - c(1,3,2,4,5,10,13,15,19,22) plot(x,y) ## Two fitted models, with ranges of [1,5] and [5,10], can be easily fitted separately by lm function as shown below: ### mod1 - lm(y[1:5] ~ x[1:5]) mod2 - lm(y[5:10] ~ x[5:10]) ### I'm looking for the interception points between these two straight lines from these fitted models, mod1 and mod2. Could someone advice me the way to do it? Thank you Fir __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Missing data and LME models and diagnostic plots
Hello Running R2.9.2 on Windows XP I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. Thus, m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum) causes an error in na.fail.default, but adding na.action = na.omit makes a model with no errors. However, if I create that model, i.e. m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum, na.action = na.omit) then the diagnostic plots suggested in Pinheiro Bates produce errors; e.g. plot(m1.mod1, schoolnum~resid(.), abline = 0) gives an error could not find function NaAct. Searching the archives showed a similar question from 2007, but did not show any responses. Thanks for any help Peter ) Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to find the interception point of two linear fitted model in R?
On Wed, Oct 21, 2009 at 12:09 PM, FMH kagba2...@yahoo.com wrote: Dear All, Let have 10 pair of observations, as shown below. ## x - 1:10 y - c(1,3,2,4,5,10,13,15,19,22) plot(x,y) ## Two fitted models, with ranges of [1,5] and [5,10], can be easily fitted separately by lm function as shown below: ### mod1 - lm(y[1:5] ~ x[1:5]) mod2 - lm(y[5:10] ~ x[5:10]) ### I'm looking for the interception points between these two straight lines from these fitted models, mod1 and mod2. Could someone advice me the way to do it? Thank you Fir extract the coefficients, rearrange in a linear system, use solve. m1 - as.matrix(rbind(coef(mod1), coef(mod2))) a - cbind(c(1,1), -m1[, 2]) a [,1] [,2] [1,]1 -0.90 [2,]1 -3.257143 b - m1[,1] solve(a=a,b=b) [1] 4.396364 4.551515 This seems to visually verify the result: plot(1:10,1:10, type=n) abline(mod1) abline(mod2) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting/modifying a symbolic formula
Thanks Gabor, you taught me two useful things: all.vars, and the fact that a formula object always has length 3. Problem solved. Bryan On 10/21/09 11:57 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Here is one way: fo - y ~ f1 * f2 one.x - lapply(all.vars(fo[[3]]), function(x) { fo[[3]] - as.name(x); fo }) one.x [[1]] y ~ f1 [[2]] y ~ f2 On Wed, Oct 21, 2009 at 11:29 AM, Bryan Hanson han...@depauw.edu wrote: Hello All.. Please consider the following: y - rnorm(20, mean = 10) f1 - as.factor(rep(c(A, B, B, A), 5)) f2 - as.factor(rep(c(C, D), 10)) testdata - data.frame(y, f1, f2) testFunc - function(formula, data, ...) { # mf - model.frame(formula, data) kw.res - kruskal.test(formula, data) } res - testFunc(y ~ f1 * f2, testdata) Which works perfectly well. Now, I would like to do some other tests which allow only one factor, for instance wilcox.test. So, I would like to modify my formula so that it reads y ~ f1 or y ~ f2, then conduct more tests. What is the smart way to subset or modify an existing symbolic formula? I have been staring at the attributes of res and mf (commented out currently) but these are nested in a way I find difficult to extract. ?terms, ?terms.formula, ?formula etc all discuss these attributes but as I said, it seems a bit impenetrable. RSiteSearch(symbolic formula) returns too many answers. No doubt I am missing the obvious, as this is my first foray into using formulas in my own functions. TIA, Bryan * Bryan Hanson Acting Chair Professor of Chemistry Biochemistry DePauw University, Greencastle IN USA sessionInfo() R version 2.10.0 RC (2009-10-19 r50172) i386-apple-darwin8.11.1 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] datasets tools grid utils stats graphics grDevices methods [9] base other attached packages: [1] ggplot2_0.8.3 reshape_0.8.3 proto_0.3-8 mvbutils_22.0 [5] ChemoSpec_1.2 lattice_0.17-25 mvoutlier_1.4 plyr_0.1.8 [9] RColorBrewer_1.0-2 chemometrics_0.4 som_0.3-4 robustbase_0.4-5 [13] rpart_3.1-45 pls_2.1-0 pcaPP_1.7 mvtnorm_0.9-7 [17] nnet_7.2-48 mclust_3.2 MASS_7.2-48 lars_0.9-7 [21] e1071_1.5-19 class_7.2-48 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] squared euclidean distance
You can calculate the Euclidean distance with dist() and then square it. Sarah On Wed, Oct 21, 2009 at 11:35 AM, Caro B. carolin.w...@googlemail.com wrote: Dear R-Help-Team, I would like to cluster my data using the ward-method. In several papers I read (e.g. Bahrenberg) that it is neccesary to use the squared euclidean distance with the ward-method. Unfortunatelly I cannot find this term in r as a method for measuring the distance. Does anybody have an idea? Thanks in advance, Carolin -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] formula and model.frame
Emulate lm. myFun - function(formula, data, na.action, id, ...){ mf - match.call(expand.dots = FALSE) m - match(c(formula, data, id, na.action), names(mf), 0L) mf - mf[c(1L, m)] mf$drop.unused.levels - TRUE mf[[1L]] - as.name(model.frame) mf - eval(mf, parent.frame()) mf } myFun(score ~ grade, qqq, id=idVar) HTH, Chuck On Wed, 21 Oct 2009, Doran, Harold wrote: Suppose I have the following function myFun - function(formula, data){ f - formula(formula) dat - model.frame(f, data) dat } Applying it with this sample data yields a new dataframe: qqq - data.frame(grade = c(3, NA, 3,4,5,5,4,3), score = rnorm(8), idVar = c(1:8)) dat - myFun(score ~ grade, qqq) However, what I would like is for the resulting dataframe (dat) to include as a column idVar. Naively, I could do dat - myFun(score ~ grade + idVar, qqq) This gives what I'm after in terms of the resulting data, but doesn't make sense within the context of the model I am working on since idVar is not one of the conditioning variables used, it has a different purpose altogether. So, I was thinking another way is to attach it somehow afterwards. Something like: myFun - function(formula, data, id, na.action){ f - formula(formula) idVar - data[, id] dat - model.frame(f, data, na.action = na.action) dat[, id] - idVar dat } myFun(score ~ grade, qqq, id = 'idVar', na.action = NULL) Of course, I intentionally use na.action = NULL here because the following occurs, of course myFun(score ~ grade, qqq, id = 'idVar', na.action = na.omit) Error in `[-.data.frame`(`*tmp*`, , id, value = 1:8) : replacement has 8 rows, data has 7 I see a few workarounds, but I am certain there is a cleaner way to accomplish this. Harold sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] lme4_0.999375-28 Matrix_0.999375-25 lattice_0.17-22xtable_1.5-5 adapt_1.0-4MiscPsycho_1.4 [7] statmod_1.3.8 loaded via a namespace (and not attached): [1] grid_2.9.0 tools_2.9.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] squared euclidean distance
On Wed, 2009-10-21 at 17:35 +0200, Caro B. wrote: Dear R-Help-Team, I would like to cluster my data using the ward-method. In several papers I read (e.g. Bahrenberg) that it is neccesary to use the squared euclidean distance with the ward-method. Unfortunatelly I cannot find this term in r as a method for measuring the distance. Does anybody have an idea? Compute the Euclidean distance then square the numbers to undo the sqrt applied within dist. set.seed(1) dat - data.frame(X1 = runif(10), X2 = runif(10)) d.euc - dist(dat) d.sqeuc - d.euc^2 class(d.sqeuc) ## still a dist object I have no idea what/who Bahrenberg is or whether/why it is advisable to use this sq Euclidean with Ward's method; in my field I've seen Ward's method used with all sorts of distance coefficients... HTH G Thanks in advance, Carolin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Temperature Prediction Model
Hi, In order to have more eyes on this, I'm CCing this back to lease (please try to keep further correspondence here, since most mail to *this* address of mine probably gets lost if it's not coming in from a list to begin with) ... I'm not really sure I have much to say about your problem ... it seems you're working in an open research area that I have 0 knowledge about. I'd recommend you talk with your advisor a bit to see what models they have in mind ... I'd imagine questions directed to this list will be better responded to when they're more specific, eg. some problem about using a particular library/model/package to perform a specific task, and not more open ended ones like which equation you can use to fit your data (or even if that's a reasonable thing to do, at all). Anyway, sorry .. don't have much valuable input, but good luck all the same. -steve On Oct 21, 2009, at 1:07 PM, Aneeta Bhattacharyya wrote: Hi Steve, Thanks for your email. The data that I have has been collected by a sensor network deployed by Intel. You may take a look at the network at the following website http://db.csail.mit.edu/labdata/labdata.html The main goal of my project is to simulate a physical layer attack on a sensor network and to detect such an attack. In order to detect an attack I need to have a model that would define the normal behaviour. So the actual variation of temperature throughout the year is not very important out here. I have a set of data for a period of 7 days which is assumed to be the correct behaviour and I need to build a model upon that data. I may refine the model later on to take into account temperature variations throughout the year. Yes I am trying to build a model that will predict the temperature just on the given time of the day so that I am able to compare it with the observed temperature and determine if there is any abnormality. Each node should have its own expectation model (i.e. there will be no correlation between the readings of the different nodes). Please let me know if you have any further questions. Many Thanks, Aneeta On Wed, Oct 21, 2009 at 12:46 PM, Steve Lianoglou mailinglist.honey...@gmail.com wrote: Hi, On Oct 21, 2009, at 12:31 PM, Aneeta wrote: Greetings! As part of my research project I am using R to study temperature data collected by a network. Each node (observation point) records temperature of its surroundings throughout the day and generates a dataset. Using the recorded datasets for the past 7 days I need to build a prediction model for each node that would enable it to check the observed data against the predicted data. How can I derive an equation for temperature using the datasets? The following is a subset of one of the datasets:- Time Temperature 07:00:17.369668 17.509 07:03:17.465725 17.509 07:04:17.597071 17.509 07:05:17.330544 17.509 07:10:47.838123 17.5482 07:14:16.680696 17.5874 07:16:46.67457 17.5972 07:29:16.887654 17.7442 07:29:46.705759 17.754 07:32:17.131713 17.7932 07:35:47.113953 17.8324 07:36:17.194981 17.8324 07:37:17.227013 17.852 07:38:17.809174 17.8618 07:38:48.00011 17.852 07:39:17.124362 17.8618 07:41:17.130624 17.8912 07:41:46.966421 17.901 07:43:47.524823 17.95 07:44:47.430977 17.95 07:45:16.813396 17.95 I think you/we need much more information. Are you really trying to build a model that predicts the temperature just given the time of day? Given that you're in NY, I'd say 12pm in August sure feels much different than 12pm in February, no? Or are you trying to predict what one sensor readout would be at a particular time given readings from other sensors at the same time? Or ... ? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Howdy, On Oct 21, 2009, at 1:05 PM, Anders Carlsson wrote: snip Yes, exactly that. In your example, though, the variation seems to be a lot smaller. I'm guessing that has to with the data. If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model. By the decision values, you mean the predict labels, right? I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)... I'd find that a bit disconcerting, too. Can you give a sample of your data + code your using that can reproduce this example? Warning: Brainstorming Below If I were to calculate probabilities for my class labels, I'd make the probability some function of the example's distance from the decision boundary. Now, if your decision boundary isn't changing from run to run (and I guess it really shouldn't be, since the SVM returns the maximum margin classifier (which is, by definition, unique, right?)), it's hard to imagine why these probabilities would change, either ... ... unless you're holding out different subsets of your data during training, or perhaps have a different value for your penalty (cost) parameter when building the model. I believe you said that you're actually training the same exact model each time, though, right? Anyway, I see the help page for ?svm says this, if it helps: The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reshaping data
Hi all, I have a matrix of correlation values between all pairwise comparison in an experiment. For example, I have 2 time points (1,2) each in triplicate. Thus, I have the following matrix 1-1 1-2 1-3 2-1 2-2 2-3 1-1 NA ... ... ... ... ... 1-2 ... NA ... ... ... ... 1-3 ... ... NA ... ... ... 2-1 ... ... ... NA ... ... 2-2 ... ... ... ... NA ... 2-3 ... ... ... ... ... NA What I'd like to do is to reshape the data so that it would be easy for me to summarize all the correlation values across the triplicates (i.e. mean) for time point 1 and 2. It sounds as though the reshape package should be able to do this, but the solution is not obvious to me. Can anybody help? Best, Ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:
David: Do you mean inappropriate or embarrassing? How would we R-ians know what has happened at REVolution were it not for Ajay's note? Were you planning a press release? Something like, 47% of Revolution summarily fired. Nobody left with more than a year of experience...? Charles Annis, P.E. charles.an...@statisticalengineering.com phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David M Smith Sent: Wednesday, October 21, 2009 2:20 PM To: Ajay ohri Cc: r-help@r-project.org Subject: Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO: It is clearly inappropriate for me or anyone else from REvolution to comment on matters pertaining to any employee or ex-employee. I would further add that this is a highly inappropriate use of this list. Apologies to all concerned. # David Smith On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote: Start the REvolution without me... -- David M Smith da...@revolution-computing.com VP of Community, REvolution Computing www.revolution-computing.com Tel: +1 (206) 577-4778 x3203 (Palo Alto, CA, USA) Download REvolution R free: http://revolution-computing.com/downloads/revolution-r.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:
On 22/10/2009, at 7:20 AM, David M Smith wrote: It is clearly inappropriate for me or anyone else from REvolution to comment on matters pertaining to any employee or ex-employee. I would further add that this is a highly inappropriate use of this list. Oh, I dunno. I have the impression that the REvolution initiative has, or could have, a substantial impact upon R, and thus what goes on at REvolution is of concern to the R community. Apologies to all concerned. For what? For the fact that a particular cat has been let out of a particular bag? cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing data and LME models and diagnostic plots
Peter Flom wrote: I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. You are confusing missing data with an unbalanced design. A strengths of LME is that it copes with the latter, which aov() does not. Regards, Mark. Peter Flom-2 wrote: Hello Running R2.9.2 on Windows XP I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. Thus, m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum) causes an error in na.fail.default, but adding na.action = na.omit makes a model with no errors. However, if I create that model, i.e. m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum, na.action = na.omit) then the diagnostic plots suggested in Pinheiro Bates produce errors; e.g. plot(m1.mod1, schoolnum~resid(.), abline = 0) gives an error could not find function NaAct. Searching the archives showed a similar question from 2007, but did not show any responses. Thanks for any help Peter ) Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Missing-data-and-LME-models-and-diagnostic-plots-tp25996584p25998546.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] random numbers between 0 and 1
Hi, To generate random numbers between 0 and 1, do you use rnorm followed by dnrom? for ex, for 10 variables a = rnorm(10) a [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393 0.80010865 [7] -0.01026882 -0.23887516 2.29912600 -1.38352143 dnorm(a) [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637 [7] 0.39892125 0.38772103 0.02838403 0.15320103 Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:
On 10/21/2009 3:09 PM, Charles Annis, P.E. wrote: David: Do you mean inappropriate or embarrassing? How would we R-ians know what has happened at REVolution were it not for Ajay's note? Were you planning a press release? I think he already did post a note on his blog mentioning that Danese was no longer employed there. There was also a press release, I don't know if it mentioned her. The blog entry also said that David continues his employment, which contradicts the second part of the claim below. So I suspect we're all still pretty much ignorant of what happened, and I wouldn't mind keeping it that way, but if I wanted more details, I'd look on the REvolution site for their side of the story, and on Danese Cooper's site for hers. Not on R-Help. Duncan Murdoch Something like, 47% of Revolution summarily fired. Nobody left with more than a year of experience...? Charles Annis, P.E. charles.an...@statisticalengineering.com phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David M Smith Sent: Wednesday, October 21, 2009 2:20 PM To: Ajay ohri Cc: r-help@r-project.org Subject: Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO: It is clearly inappropriate for me or anyone else from REvolution to comment on matters pertaining to any employee or ex-employee. I would further add that this is a highly inappropriate use of this list. Apologies to all concerned. # David Smith On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote: Start the REvolution without me... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:
Well I do not think that this is highly inappropriate use of this list. I am sure you new designation of VP makes you feel all more powerful but your authority does not extend to this list. It should continue to rest with Core R team. It seems had this list been under your control (i.e revolution computing) you would have turned it off-typically Corporate Strategy. Just an example how money and designation influences those who claim otherwise. Date: Wed, 21 Oct 2009 14:20:08 -0400 From: da...@revolution-computing.com To: ohri2...@gmail.com CC: r-help@r-project.org Subject: Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO: It is clearly inappropriate for me or anyone else from REvolution to comment on matters pertaining to any employee or ex-employee. I would further add that this is a highly inappropriate use of this list. Apologies to all concerned. # David Smith On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote: Start the REvolution without me... -- David M Smith da...@revolution-computing.com VP of Community, REvolution Computing www.revolution-computing.com Tel: +1 (206) 577-4778 x3203 (Palo Alto, CA, USA) Download REvolution R free: http://revolution-computing.com/downloads/revolution-r.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ Hotmail: Trusted email with powerful SPAM protection. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random numbers between 0 and 1
Uniformly distributed random numbers between 0 and 1? Try ?runif -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of carol white Sent: Wednesday, October 21, 2009 2:26 PM To: r-h...@stat.math.ethz.ch Subject: [R] random numbers between 0 and 1 Hi, To generate random numbers between 0 and 1, do you use rnorm followed by dnrom? for ex, for 10 variables a = rnorm(10) a [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393 0.80010865 [7] -0.01026882 -0.23887516 2.29912600 -1.38352143 dnorm(a) [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637 [7] 0.39892125 0.38772103 0.02838403 0.15320103 Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:
On 22/10/2009, at 8:27 AM, kulwinder banipal wrote: Well I do not think that this is highly inappropriate use of this list. I am sure you new designation of VP makes you feel all more powerful but your authority does not extend to this list. It should continue to rest with Core R team. It seems had this list been under your control (i.e revolution computing) you would have turned it off-typically Corporate Strategy. Just an example how money and designation influences those who claim otherwise. Right on, Red Freak!!! cheers, Rolf Turner Date: Wed, 21 Oct 2009 14:20:08 -0400 From: da...@revolution-computing.com To: ohri2...@gmail.com CC: r-help@r-project.org Subject: Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO: It is clearly inappropriate for me or anyone else from REvolution to comment on matters pertaining to any employee or ex-employee. I would further add that this is a highly inappropriate use of this list. Apologies to all concerned. # David Smith On Wed, Oct 21, 2009 at 1:51 PM, Ajay ohri ohri2...@gmail.com wrote: Start the REvolution without me... -- David M Smith da...@revolution-computing.com VP of Community, REvolution Computing www.revolution-computing.com Tel: +1 (206) 577-4778 x3203 (Palo Alto, CA, USA) Download REvolution R free: http://revolution-computing.com/downloads/revolution-r.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. _ Hotmail: Trusted email with powerful SPAM protection. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random numbers between 0 and 1
I would suggest to use the generator at http://submoon.freeshell.org/pix/valium/dilbert_rng.jpg and subtract 8.5. Best, Gabor On Wed, Oct 21, 2009 at 9:25 PM, carol white wht_...@yahoo.com wrote: Hi, To generate random numbers between 0 and 1, do you use rnorm followed by dnrom? for ex, for 10 variables a = rnorm(10) a [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393 0.80010865 [7] -0.01026882 -0.23887516 2.29912600 -1.38352143 dnorm(a) [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637 [7] 0.39892125 0.38772103 0.02838403 0.15320103 Regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gabor Csardi gabor.csa...@unil.ch UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:
On Wed, Oct 21, 2009 at 8:09 PM, Charles Annis, P.E. charles.an...@statisticalengineering.com wrote: David: Do you mean inappropriate or embarrassing? How would we R-ians know what has happened at REVolution were it not for Ajay's note? Were you planning a press release? Something like, 47% of Revolution summarily fired. Nobody left with more than a year of experience...? Ajay's note was a verbatim cut n paste from Danese Cooper's blog, with no comment, editorial, opinion, or anything. Although Danese may well be an open-source evangelist, that doesn't mean her blog is public domain. This was essentially MLP[1] which I don't think we want on the R-help list. Anyone who cares about REvolution will already be following the relevant blogs, reading the New York Times, and the words of the prophet that were written on the subway walls. Oops I went all Simon and Garfunkel there. If people want to add value to relevant blog posts onto R-news by commenting and inviting discussion then that's fine by me. But no more cut n paste jobs. Barry [1] Mindless Link Propogation - when someone just emails Hey guyz look at this : http://example.com/lulz; __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: reshaping data
I apologize for the previous post using HTML. Haven't posted for a while and e-mail client default. Best, Ken Hi all, I have a matrix of correlation values between all pairwise comparison in an experiment. For example, I have 2 time points (1,2) each in triplicate. Thus, I have the following matrix 1-1 1-2 1-3 2-1 2-2 2-3 1-1NA ... ... ... ... ... 1-2... NA ... ... ... ... 1-3... ... NA ... ... ... 2-1... ... ... NA ... ... 2-2... ... ... ... NA ... 2-3... ... ... ... ... NA What I'd like to do is to reshape the data so that it would be easy for me to summarize all the correlation values across the triplicates (i.e. mean) for time point 1 and 2. It sounds as though the reshape package should be able to do this, but the solution is not obvious to me. Can anybody help? Best, Ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random numbers between 0 and 1
On Wed, Oct 21, 2009 at 8:37 PM, Gábor Csárdi csa...@rmki.kfki.hu wrote: I would suggest to use the generator at http://submoon.freeshell.org/pix/valium/dilbert_rng.jpg and subtract 8.5. You may laugh (indeed I did) but some medical trials have used (and poss still do) telephone-a-human random numbers. When deciding to give medicine or placebo the doctor calls the phone number and asks for a random number, which is read out, and this decides what the patient gets. I suppose this could be implemented in R with an interface to a speech recognition engine and a telephone... but runif(100) is easier. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random numbers between 0 and 1
On 22/10/2009, at 8:25 AM, carol white wrote: Hi, To generate random numbers between 0 and 1, do you use rnorm followed by dnrom? for ex, for 10 variables a = rnorm(10) a [1] -0.87640764 -0.95842391 -1.33434559 -0.63844932 -1.69829393 0.80010865 [7] -0.01026882 -0.23887516 2.29912600 -1.38352143 dnorm(a) [1] 0.27171985 0.25202507 0.16378878 0.32538464 0.09432211 0.28966637 [7] 0.39892125 0.38772103 0.02838403 0.15320103 Well, this will give you random (in some sense) numbers between 0 and 1. (Actually they will be between 0 and dnorm(0) = approx. 0.3989.) Just what the *distribution* of these numbers would be is obscure to me. You should also be aware that the values of dnorm() could be *larger* than 1, if the standard deviation were specified to be something smaller than the default value of 1. (Note that dnorm() is a ***density***, not a probability function.) Anyway this is a very convoluted way of going at the problem. Why not just generate random numbers between 0 an 1 ``directly'', using runif()? cheers, Rolf Turner ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computingand SPSS CEO:
On 22/10/2009, at 8:37 AM, Barry Rowlingson wrote: On Wed, Oct 21, 2009 at 8:09 PM, Charles Annis, P.E. charles.an...@statisticalengineering.com wrote: David: Do you mean inappropriate or embarrassing? How would we R-ians know what has happened at REVolution were it not for Ajay's note? Were you planning a press release? Something like, 47% of Revolution summarily fired. Nobody left with more than a year of experience...? Ajay's note was a verbatim cut n paste from Danese Cooper's blog, with no comment, editorial, opinion, or anything. Although Danese may well be an open-source evangelist, that doesn't mean her blog is public domain. This was essentially MLP[1] which I don't think we want on the R-help list. Anyone who cares about REvolution will already be following the relevant blogs, reading the New York Times, and the words of the prophet that were written on the subway walls. Well I don't know how much I ***care*** about REvolution, but I'm at least vaguely interested. And I was toadally unaware of what was going on until I read Ajay's post. (I don't read the New York Times, or blogs, or subway walls.) So from my pov Ajay did the community a service. Oops I went all Simon and Garfunkel there. If people want to add value to relevant blog posts onto R-news by commenting and inviting discussion then that's fine by me. But no more cut n paste jobs. Barry [1] Mindless Link Propogation - when someone just emails Hey guyz look at this : http://example.com/lulz; Hey Baz; sometimes things just speak for themselves. No commentary is necessary. cheers, Rolf ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SVM probability output variation
Hi, snip If I instead output the decision values, the whole procedure is fully reproducible, i.e. the exact same values are returned when I retrain the model. By the decision values, you mean the predict labels, right? The output of decision values can be turned on in the predict.svm, and is, as I have understood, the distance from the data point to the hyperplane. (I should say that my knowledge here is limited to concepts, I know nothing about the details in which this works...). I use these to create ROC curves etc. I have no idea how the probabilities are calculated, but it seems to be in this step that the differences arise. In my case, I feel a bit hesitant to use them when they differ that much between runs (15% or so)... I'd find that a bit disconcerting, too. Can you give a sample of your data + code your using that can reproduce this example? I have the data at the office, so I can't do that now (at home). Warning: Brainstorming Below If I were to calculate probabilities for my class labels, I'd make the probability some function of the example's distance from the decision boundary. Now, if your decision boundary isn't changing from run to run (and I guess it really shouldn't be, since the SVM returns the maximum margin classifier (which is, by definition, unique, right?)), it's hard to imagine why these probabilities would change, either ... ... unless you're holding out different subsets of your data during training, or perhaps have a different value for your penalty (cost) parameter when building the model. I believe you said that you're actually training the same exact model each time, though, right? Yes, I'm using the exact same data to train each time. I thought this would generate identical models, but that doesn't appear to be the case. Anyway, I see the help page for ?svm says this, if it helps: The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization This is where I realise I'm in a bit over my head on the theroy side - this means nothing to me... -steve Thanks again, Anders __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Temperature Prediction Model
The data that I use has been collected by a sensor network deployed by Intel. You may take a look at the network at the following website http://db.csail.mit.edu/labdata/labdata.html The main goal of my project is to simulate a physical layer attack on a sensor network and to detect such an attack. In order to detect an attack I need to have a model that would define the normal behaviour. So the actual variation of temperature throughout the year is not very important out here. I have a set of data for a period of 7 days which is assumed to be the correct behaviour and I need to build a model upon that data. I may refine the model later on to take into account temperature variations throughout the year. Yes I am trying to build a model that will predict the temperature just on the given time of the day so that I am able to compare it with the observed temperature and determine if there is any abnormality. Each node should have its own expectation model (i.e. there will be no correlation between the readings of the different nodes). Steve Lianoglou-6 wrote: Hi, On Oct 21, 2009, at 12:31 PM, Aneeta wrote: Greetings! As part of my research project I am using R to study temperature data collected by a network. Each node (observation point) records temperature of its surroundings throughout the day and generates a dataset. Using the recorded datasets for the past 7 days I need to build a prediction model for each node that would enable it to check the observed data against the predicted data. How can I derive an equation for temperature using the datasets? The following is a subset of one of the datasets:- Time Temperature 07:00:17.369668 17.509 07:03:17.465725 17.509 07:04:17.597071 17.509 07:05:17.330544 17.509 07:10:47.838123 17.5482 07:14:16.680696 17.5874 07:16:46.67457 17.5972 07:29:16.887654 17.7442 07:29:46.705759 17.754 07:32:17.131713 17.7932 07:35:47.113953 17.8324 07:36:17.194981 17.8324 07:37:17.227013 17.852 07:38:17.809174 17.8618 07:38:48.00011 17.852 07:39:17.124362 17.8618 07:41:17.130624 17.8912 07:41:46.966421 17.901 07:43:47.524823 17.95 07:44:47.430977 17.95 07:45:16.813396 17.95 I think you/we need much more information. Are you really trying to build a model that predicts the temperature just given the time of day? Given that you're in NY, I'd say 12pm in August sure feels much different than 12pm in February, no? Or are you trying to predict what one sensor readout would be at a particular time given readings from other sensors at the same time? Or ... ? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Temperature-Prediction-Model-tp25995874p25997650.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing data and LME models and diagnostic plots
I wrote I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. Mark Difford replied You are confusing missing data with an unbalanced design. A strengths of LME is that it copes with the latter, which aov() does not. Thanks for your reply, but I don't believe I am confused with respect to missing data in mixed models. See e.g. Hedeker and Gibbons, Longitudinal Data Analysis, which repeatedly stresses that mixed models provide good estimates if the data are missing at random. Regards Peter Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO:
I am sorry. I will refrain from this in the future (using the list which is a technical resource and not a forum inappropriately). My blog has my views on it -http://decisionstats.com so I wont cut and paste on that. I would like to applaud David's team at REvolution for finally releasing an Ubuntu version of REvolution R, though the Windows version 64 bit was developed much earlier. I would like to say that Dave's credentials in open source or his personal technical authority has not been questioned by me ( and I don't think by anyone on this list). The new management is led by an ex Founder of SPSS, so I guess that things are looking up already. A new CEO can only mean more hunger to get R up and running in corporate circles so that smart people who write excellent code start making more money than ...? . Commerce demands that REvolution gets a bigger share than it has been getting and thats what most of the people on this list want ( except there is no forums section for discussion for R- help while there are forums section. Ajay Knoxville Student Go Vols! On Wed, Oct 21, 2009 at 2:24 PM, Erik Iverson eiver...@nmdp.org wrote: Nothing to do with what R-help is about. Please refrain from posting things like this in the future. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ajay ohri Sent: Wednesday, October 21, 2009 12:52 PM To: r-help@r-project.org; sas-l; spssx-l Subject: [R] News on R s largest corporate partner REVolution Computing and SPSS CEO: Start the REvolution without me... http://danesecooper.blogs.com/divablog/2009/10/start-the-revolution- without-me.html *From Danese Cooper's Blog* Some of you may have become aware of REvolution Computinghttp://revolution-computing.com/, a commercial open source company organized around the R Languagehttp://www.r-project.org/, when I joined in March http://news.cnet.com/8301-1001_3-10202355-92.htmlof this year. For the past few months we have been working on a B-Round of funding. It was an interesting process and I was happy to be working in my first startup company after so many years in very large corporations. We built a small team to work on Community Engineering, by which we meant developing assets both to benefit the R Language community as well as to entice and inform the Alpha-Geek community to learn and use R. We set up an Advisory Board designed to advise REvolution management about decisions relating to REvo and Open Source, and we helped put REvolution R into the Karmic Koala release of Ubuntu. It was really fun to work in a small, agile team and I felt like I was getting a great education in startups and we were rapidly moving the company forward...Why didn't I join a startup years ago? The funding deal closed on Wednesday last week... Late the next afternoon I received a call from the new COO notifying me that my services would no longer be required at REvolution., effective immediately and with no severance. Apparently, the company is moving in a different direction. http://blog.revolution-computing.com/2009/10/revolution-computing-gets- major-funding-new-ceo.html I was surprised that the new CEO, wasn't personally handling this unpleasant task...but I guess that might have been distasteful after the many assurances he gave me and my team last July at OSCON that we were absolutely critical to the company's success and that he would be making no changes for at least three months after he assumed control. Personal courage in difficult situations is rare. What I find most interesting about today's REvolution announcements is the space they spent thanking the previous management team, given nearly all of us, including the Founders and the Board, were just fired. 47% of the company wiped out and nobody left with more than a year of experience...Shit happens... And so we begin to pick up the pieces and move on. I've spent much of the past few days consoling coworkers, personally breaking the news to the many kind friends who had agreed to help us increase interest in R and Revolution, and working out what I might be doing next. I have some interesting possibilities already, although I'm still open to suggestions...so stay tuned. Meanwhile I can honestly say that the new REvolution Computing will little resemble the company I was proud to join and represent. I still think the R Language is really interesting, but I'm no longer sure REvo is the one to watch in this space anymore...For the sake of my friends among the remaining employees and shareholders I hope I'm wrong. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] date conversion not as i would have expected
Hi This is on WinXP with regional settings as EST (we are now on DST but I run EST) R2.9.2 x - structure(1254351600, class = c(POSIXt, POSIXct), tzone = ) x [1] 2009-10-01 09:00:00 EST as.POSIXlt(x) [1] 2009-10-01 09:00:00 EST as.Date(x, formate=%Y-%m-%d ) [1] 2009-09-30 I had a similar problem last week. I am not sure how Bill Gates does his times but for R see Rnews 4(1). There are slight differences between POSIX and Date classes NB If you run EST in DST periods Bill Gates still gives the file stamp as DST in Vista and XP. I would presume that it is the same for other zones where there is summer time zones Regards Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home: ma...@northnet.com.au At 00:51 22/10/2009, you wrote: Good day, i imported some data into R from Excel. By using the edit() function, this is what one of the dates looks like in R: x - structure(1254351600, class = c(POSIXt, POSIXct), tzone = ) [1] 2009-10-01 BST However, when i do the following, the date changes: as.Date(x, formate=%Y-%m-%d ) [1] 2009-09-30 I don't understand why this is happening. I realise that i can get around this by doing as.Date(as.character(x)), but would be nice to understand why it doesn't work directly. C xx __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I vectorize this loop....
Basically I need to use the following data to calculate a squared error for each Sample based on the expected Survival for the zone. Basically, this code has Live/Dead for each sample, and I need to calculate the square error based on the Expected Mean (ie, Survival). The code looks up the expectation for each zone and applies for each sample in the zone using a loop: Data1 - data.frame(Sample=1:6, Live =c(24,25,30,31,22,23), Baseline = c(35,35,35,32,34,33),Zone = c(rep(Cottonwood,3),rep(OregonAsh,3))) Data2 - data.frame(Zone = c(Cottonwood,OregonAsh), Survival = c(0.83,0.76)) for (i in 1:nrow(Data1)) #(Yi -Ybar*Yo)^2 Data1$SquaredError[i] - (Data1$Live[i] - Data2$Survival[which(Data1$Zone[i]==Data2$Zone)]*Data1$Baseline[i])^2 My question is, can I vectorize this code to avoid the loop? Obviously, I could merge the 2 datasets first, but that would still require 2 steps and Data1 would have a bunch of redundant data. So, is there a better alternative? Is there some way I improve indexing syntax efficiency by using rownames instead of a column vector? -- View this message in context: http://www.nabble.com/How-do-I-vectorize-this-loop-tp26000933p26000933.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] increase size of filled box in a legend
Hello R user community, can anyone tell me how to increase the size of a filled box in a legend? thanx, Janet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] drawing a line indicating extent of each factored data series in multipanel lattice xyplot
Hi, Am am plotting aggregated frequency data (extracted from an RDBMS) relating to DNA sequence features for each of the human chromosomes as described in the table chromosomes below (the frequency data is in a table 'hits' that has a value (or not) for each of a set of bins across each chromosome). I would like to mark the extent of the chromosome (according to the length value in chromosome) with a line under the xyplot for each panel (at the moment - not included in the xyplot statement below - I have just added the length in bins to the panel strip, but this is not the best solution for representing the data). I have tried a couple of approaches to this (example below), but have not got anything to work (the 10 value refers to bin size). Can anyone suggest an approach to accomplish this? An additional though less significant question is how I can get the panels to be in the order of the chromosomes as listed in the chromosomes table rather than alphabetical order of their names. thanks Dan chromosomes chromosome refSeq namelength 1 0 NC_01.9 Homo sapiens chromosome 1 247249719 2 1 NC_02.10 Homo sapiens chromosome 2 242951149 3 2 NC_03.10 Homo sapiens chromosome 3 199501827 4 3 NC_04.10 Homo sapiens chromosome 4 191273063 5 4 NC_05.8 Homo sapiens chromosome 5 180857866 6 5 NC_06.10 Homo sapiens chromosome 6 17082 7 6 NC_07.12 Homo sapiens chromosome 7 158821424 8 7 NC_08.9 Homo sapiens chromosome 8 146274826 9 8 NC_09.10 Homo sapiens chromosome 9 140273252 10 9 NC_10.9 Homo sapiens chromosome 10 135374737 11 10 NC_11.8 Homo sapiens chromosome 11 134452384 12 11 NC_12.10 Homo sapiens chromosome 12 132349534 13 12 NC_13.9 Homo sapiens chromosome 13 114142980 14 13 NC_14.7 Homo sapiens chromosome 14 106368585 15 14 NC_15.8 Homo sapiens chromosome 15 100338915 16 15 NC_16.8 Homo sapiens chromosome 16 88827254 17 16 NC_17.9 Homo sapiens chromosome 17 78774742 18 17 NC_18.8 Homo sapiens chromosome 18 76117153 19 18 NC_19.8 Homo sapiens chromosome 19 63811651 20 19 NC_20.9 Homo sapiens chromosome 20 62435964 21 20 NC_21.7 Homo sapiens chromosome 21 46944323 22 21 NC_22.9 Homo sapiens chromosome 22 49691432 23 22 NC_001807.4 Homo sapiens mitochondrion 16571 24 23 NC_23.9 Homo sapiens chromosome X 154913754 25 24 NC_24.8 Homo sapiens chromosome Y 57772954 xyplot(hits~bin|chromosomes$name[chromosome+1], data=hits, horizontal=FALSE, origin=0, lab=c(3,10),pch=20,cex=0.01, layout=c(1,5,5), strip=strip.custom(style=3, bg=grey90,par.strip.text=list(cex=0.5)), as.table=TRUE, panel=function(x,y,...,length=chromosomes$length[chromosome+1]) {panel.xyplot(x,y,...); panel.lines(0,-100,length/10,-100)} ) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sorting table output
Dear R People: Suppose I have the following output: table(xx) xx A C G T 13 12 10 15 I would like to have the output sorted in descending order by height or frequency. But when I do the following: rev(table(xx)) xx T G C A 15 10 12 13 the output is sorted by the names rather than the frequency. Any suggestions, please? Thanks in advance, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sorting table output
I would like to have the output sorted in descending order by height or frequency. But when I do the following: rev(table(xx)) xx T G C A 15 10 12 13 Err, I guess you meant to write sort(table(xx)) here? Cheers, Stefan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sorting table output
On 22/10/2009, at 11:16 AM, Erin Hodgess wrote: Dear R People: Suppose I have the following output: table(xx) xx A C G T 13 12 10 15 I would like to have the output sorted in descending order by height or frequency. But when I do the following: rev(table(xx)) xx T G C A 15 10 12 13 the output is sorted by the names rather than the frequency. Any suggestions, please? (a) Why on earth are you using ***rev()***??? (b) Does this do what you want? tx - table(xx) tx[order(tx)] xx G C A T 10 12 13 15 cheers, Rolf ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I vectorize this loop....
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of chipmaney Sent: Wednesday, October 21, 2009 2:58 PM To: r-help@r-project.org Subject: [R] How do I vectorize this loop Basically I need to use the following data to calculate a squared error for each Sample based on the expected Survival for the zone. Basically, this code has Live/Dead for each sample, and I need to calculate the square error based on the Expected Mean (ie, Survival). The code looks up the expectation for each zone and applies for each sample in the zone using a loop: Data1 - data.frame(Sample=1:6, Live =c(24,25,30,31,22,23), Baseline = c(35,35,35,32,34,33),Zone = c(rep(Cottonwood,3),rep(OregonAsh,3))) Data2 - data.frame(Zone = c(Cottonwood,OregonAsh), Survival = c(0.83,0.76)) for (i in 1:nrow(Data1)) #(Yi -Ybar*Yo)^2 Data1$SquaredError[i] - (Data1$Live[i] - Data2$Survival[which(Data1$Zone[i]==Data2$Zone)]*Data1$Baseline[i])^2 My question is, can I vectorize this code to avoid the loop? Does the following do what you want? (Data1$Live - Data2$Survival[match(Data1$Zone,Data2$Zone)]*Data1$Baseline)^2 [1] 25.5025 16.4025 0.9025 44.6224 14.7456 4.3264 Obviously, I could merge the 2 datasets first, but that would still require 2 steps and Data1 would have a bunch of redundant data. Why worry about a bunch of 'redundant' data? Its space is freed after it is no longer referenced, so if you do the processing in a function and don't put all the temporarily needed data in a permanent dataset it won't take up space for long. If you are running into the memory limits of your hardware then you have to work harder to avoid using more memory that you need. (Note that vec[which(logical)] generally gives the same result as vec[logical] but uses more space doing so.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com So, is there a better alternative? Is there some way I improve indexing syntax efficiency by using rownames instead of a column vector? -- View this message in context: http://www.nabble.com/How-do-I-vectorize-this-loop-tp26000 933p26000933.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Use of the command 'replicate'
Dear R users, I'd like to ask u whether you know how to sort out the following: I'm trying to reproduce a dataset of clusters, and for that I need to build up a cluster index (inside a function) using the command replicate as follows: dataset- function( clusters=100, cluster.size=50, outcome.mean=list(mean=3, sd=1), outcome.var=list(mean=0, sd=1), treat=1, unbal = FALSE){ if(clusters%%2!=0) stop(number of clusters, clusters, must be an even number \n) clust.size-cluster.size clusters-clusters clust.index - rep(1:clusters, times=clust.size) # cluster index treat.ind - c( # treatment index indicator rep(0, sum(clust.size[1:(clusters/2)])), rep(1, sum(clust.size[(clusters/2+1):clusters])) ) (...) } Doing this it gives me an error saying that the 'times' argument is invalid and that the cluster and treatment indicator indices are both invalid, and therefore, I'm probably defining clust.size wrongly?! Could you elucidate me about this?? Thanks a lot. Manny Gomez -- View this message in context: http://www.nabble.com/Use-of-the-command-%27replicate%27-tp26001367p26001367.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple imputation with mix package
I am running into a problem using 'mix' for multiple imputation (over continuous and categorical variables). For the way I will be using this I would like to create an imputation model on some training data set and then use this model to impute missing values for a different set of individuals (i.e. I need to have a model in place before I receive their information). I expected that all that model information would be stored in a single object. In other words, when I run: imp.mix(s, theta, x) I expected that theta completely specifies the model and s (created by prelim.mix) and x only contain info on incomplete the data set I want to impute. As best as I can tell this is not the case. For instance, I create a 'model object' (a general location parameter list) from the data set trainSet using em.mix as follows: sTrain - prelim.mix(trainSet,nCategorical) thetaTrain - em.mix(sTrain,maxits=1000,showits=TRUE,eps=0.0001) I then attempt to use this model to impute a missing field (TC) in the data set workSet as follows: workSet$TC - NA sWork - prelim.mix(workSet,nCategorical) imputedWork - imp.mix(sWork,thetaTrain,workSet) This does not give realistic values for TC (they are around 0). My guess is that the part of the imputation model information is stored in sWork If I do this it looks like it works better (values of TC in the correct range): sWork$xbar = sTrain$xbar sWork$sdv = sTrain$sdv imputedWork - imp.mix(sWork,thetaTrain,workSet) Can someone say whether what I am doing is correct? Is changing xbar and sdv by hand a proper solution? I'm not sure whether that could mess other things up. Thanks, Kurt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] melting columns
Hello, I'm using R to run a acoustic analysis software called Seewave. I ask the code to extract a list of variables from my recording, and the program give ONE table for each of these. The tables consist of a two column data.frame with the time in column 1 and the frequency in column 2. However, for my purpose I need only one column with the time first and the frequency second. I tried different things but my knowledge in R is definitely too low to answer that. Thanks for your help! Thibault Grava PhD candidate Natural Resources and Environmental Studies University of Northern British Columbia Prince George, BC V2N 4Z9 Tel: (250) 960-6050 email:gr...@unbc.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loop vs. apply(): strange behavior with data frame?
Hi everybody, I noticed a strange behavior when using loops versus apply() on a data frame. The example below explicitly computes a distance matrix given a dataset. When the dataset is a matrix, everything works fine. But when the dataset is a data.frame, the dist.for function written using nested loops will take a lot longer than the dist.apply USING FOR ### dist.for - function(data) { d - matrix(0,nrow=nrow(data),ncol=nrow(data)) n - ncol(data) r - nrow(data) for(i in 1:r) { for(j in 1:r) { d[i,j] - sum(abs(data[i,]-data[j,]))/n } } return(as.dist(d)) } USING APPLY ### f - function(data.row,data.rest) { r2 - as.double(apply(data.rest,1,g,data.row)) } g - function(row2,row1) { return(sum(abs(row1-row2))/length(row1)) } dist.apply - function(data) { d - apply(data,1,f,data) return(as.dist(d)) } TESTING ### library(mvtnorm) data - rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10)) tf - system.time(df - dist.for(data)) ta - system.time(da - dist.apply(data)) print(paste('diff = ',sum(as.matrix(df) - as.matrix(da print(tf = ) print(tf) print(ta = ) print(ta) print('--') print('Same experiment on data.frame...') data2 - as.data.frame(data) tf - system.time(df - dist.for(data2)) ta - system.time(da - dist.apply(data2)) print(paste('diff = ',sum(as.matrix(df) - as.matrix(da print(tf = ) print(tf) print(ta = ) print(ta) Here is the output I get on my system (R version 2.7.1 on a Debian lenny) [1] diff = 0 [1] tf = user system elapsed 0.088 0.000 0.087 [1] ta = user system elapsed 0.128 0.000 0.128 [1] -- [1] Same experiment on data.frame... [1] diff = 0 [1] tf = user system elapsed 35.031 0.000 35.029 [1] ta = user system elapsed 0.184 0.000 0.185 Could you explain why that happens? thank you, regards Roberto __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating Random Effects Coefficients from lmer
Hello all, I am new to the list serve but hope to contribute what I can. For now, however, I hope to tap into all your knowledge about mixed effects models. Currently, I am running a mixed effects model on a time series panel data. For this model I want to find out the fixed and random effect for two discrete variables. My Model: m1 - lmer(mghegdp_who ~ govdisgdp_up_net + pridisgdp_up_net + mgdppc_usd06_imf + drdisgdp + mggegdpwb + hiv_prevalence + (0 + govdisgdp_up_net|country) + (0 + pridisgdp_up_net|country), data) To find the overall effect *with confidence intervals* of govdisgdp_up_net pridisgdp_up_net I need to account for the fixed and random effects. Has any calculated this? Or does anyone have suggestions? Also, does anyone know of the ability to calculate the postVar for a model with multiple random effects? Thank you and look forward to hearing your insights. Matt 'Lost in Seattle' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error: NA/NaN/Inf in foreign function call (arg 1)
Hi, Below is the code that is giving me this error message: #=== #The code below results in error: Error: NA/NaN/Inf in foreign function #call (arg 1) #=== A=rep(c(-1,1),8);B=rep(c(-1,-1,1,1),4);C=rep(c(-1,-1,-1,-1,1,1,1,1),2) D=c(rep(-1,8),rep(1,8)) y=c(38.01697,42.04596,43.89677,47.78144,37.34869,41.90744,44.01468,47.96458,37.97263,41.83482,44.01936,48.12203,38.73267,42.92054,44.16163,48.88900) bcols=c(1,2) dcols=c(3,4) locmodel=glm(y~A+B) mlcoefs=locmodel$coef res=locmodel$residuals ressq=res^2 dispmodel=glm(ressq~C+D,family=Gamma(link = log)) #=== #The code below results in error message: # Warning message: In glm.fit(x = X, y = Y, weights = weights, start = # start, etastart = #etastart, : algorithm did not converge #=== A=rep(c(-1,1),8) B=rep(c(-1,-1,1,1),4) C=rep(c(-1,-1,-1,-1,1,1,1,1),2) D=c(rep(-1,8),rep(1,8)) y=c(37.93499,42.19333,44.02388,48.04979,37.52459,41.90613,43.44424,47.81314, 38.04942,42.06124,44.00249,47.89033,39.00372,40.47911,43.78126,48.55228) locmodel=glm(y~A+B) mlcoefs=locmodel$coef res=locmodel$residuals ressq=res^2 dispmodel=glm(ressq~C+D,family=Gamma(link = log)) #=== I researched on this error and it can be caused by missing values, which is not the case for this example. I would really appreciate any help that can be extended. Rainier Sabangan BGSU MSAS Student __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simulating AR() using a
good day everyone! i have a time series (andong.ts) and fitted and AR() model using the following code andong.ts - ts(read.table(D:/.../andong.csv, header = TRUE), start = c(1966,1), frequency = 1) ar(andong.ts) Call: ar(x = andong) Coefficients: 1 2 3 0.3117 0.0607 0.0999 Order selected 3 sigma^2 estimated as 0.8443 I am aware that my model is now x(t) = 0.3117x(t-1) + 0.0607x(t-2) + 0.0999x(t-3) + e(t) eqn(1) but i am not sure of how to perform my simulation for the new series. i tried to look at arima.sim, arima.sim(ar = c(0.3117,0.0607, 0.0999), n = 36) but it doesn't seem to use the old series that should be simulated (or maybe it is but i am not aware) Could anyone please brief me on how to perform my simulation properly which would perform eqn (1) with e(t) being the residual series? thanks in advance!!! -- View this message in context: http://www.nabble.com/simulating-AR%28%29-using-a-tp26002850p26002850.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help in simulating AR models
good day everyone! i have a time series (andong.ts) and fitted and AR() model using the following code andong.ts - ts(read.table(D:/.../andong.csv, header = TRUE), start = c(1966,1), frequency = 1) ar(andong.ts) Call: ar(x = andong) Coefficients: 1 2 3 0.3117 0.06070.0999 Order selected 3 sigma^2 estimated as 0.8443 I am aware that my model is now x(t) = 0.3117x(t-1) + 0.0607x(t-2) + 0.0999x(t-3) + e(t) eqn(1) but i am not sure of how to perform my simulation for the new series. i tried to look at arima.sim, arima.sim(ar = c(0.3117,0.0607, 0.0999), n = 36) but it doesn't seem to use the old series that should be simulated (or maybe it is but i am not aware) Could anyone please brief me on how to perform my simulation properly which would perform eqn (1) with e(t) being the residual series? thanks in advance!!! -- View this message in context: http://www.nabble.com/help-in-simulating-AR-models-tp26003111p26003111.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on mixed effect models with LME
On Wed, Oct 21, 2009 at 11:06 AM, Peter Flom peterflomconsult...@mindspring.com wrote: ... I have a longitudinal data set, with data on schools and their test scores over a four year period. I have centered year, and run the following m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum, na.action = na.omit) where math_1 is a percentage of students in a given school that are at the lowest math achievement level, year is year, TFC_ is a categorical variable for a treatment I wish to evaluate, and schoolnum is an identifier. When I run summary on this model, I get a strong negative correlation (-0.91) between the intercept and I(year-2007.5), despite the fact that the mean of year is 2007.5. Hi Peter, For the what's going on here? questions it's very helpful to have a reproducible example. I tried to create data fitting your description, but the correlation disappeared as expected: set.seed(777) library(nlme) school - factor(rep(1:20, each=4)) year - rep(2006:2009, 20) year.c - year - mean(year) tmt - sample(0:1, 20, replace = TRUE)[school] math - rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school] tmt - factor(tmt) dfr - data.frame(math, school, tmt, year, year.c) rm(math, school, year, tmt) f1 - lme(math ~ year*tmt, data = dfr, random=~1|school) f2 - update(f1, . ~ year.c*tmt) summary(f1)$corFixed['year', '(Intercept)'] # [1] -0.997 summary(f2)$corFixed['year.c', '(Intercept)'] # [1] 0 A possibility is that the data are not of the expected classes. What does str(long) report? hth, Kingsford Jones I am puzzled, as I thought centering the time variable should eliminate, or at least strongly reduce, this correlation. Any insights appreciated thanks Peter Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot log scale
Hi - a simple question. I know that on plot one can set log=xy to set scale on both axes to log scale. I use this to plot autocorrelation of a slow decaying autocorrelated process. It has some points that are zero or slightly negative and cause this plot to return error. I'm wondering if there is anyway to force omitting these points? Thank you. rc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing data and LME models and diagnostic plots
Mixed models based on likelihood methods can often handle missing observations within subjects, but they not do well with missing individual elements in the design matrices (think unit nonresponse vs item nonresponse in the survey world). Continuing with the example I recently sent to you set.seed(0112358) library(nlme) school - factor(rep(1:20, each=4)) year - rep(2006:2009, 20) year.c - year - mean(year) tmt - sample(0:1, 20, replace = TRUE)[school] math - rnorm(80, 2 + tmt + .001*year + .0001*tmt*year, 1.5) + rnorm(20)[school] tmt - factor(tmt) dfr - data.frame(math, school, tmt, year, year.c) rm(math, school, year, year.c, tmt) # Now create missing observations, # -- note just a single obs for the first school: dfr2 - dfr[-c(2:6, sample(75, 5)), ] table(dfr2$school) # # school: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # numobs: 1 2 4 3 4 4 4 4 4 3 3 4 4 4 3 4 4 4 4 4 f3 - lme(math ~ year.c*tmt, data = dfr2, random=~1|school) intervals(f3) #Approximate 95% confidence intervals # # Fixed effects: # lower est. upper #(Intercept) 3.0705975 3.81227729 4.5539571 #year.c -0.3913985 0.09107008 0.5735386 #tmt1 0.3146311 1.34895504 2.3832790 #year.c:tmt1 -0.4385680 0.20748194 0.8535319 #attr(,label) #[1] Fixed effects: # # Random Effects: # Level: school # lower est.upper #sd((Intercept)) 0.35068 0.7334782 1.534134 # # Within-group standard error: # lower est.upper #1.226240 1.492044 1.815464 Note that even with the missing obs, in this instance we got pretty good estimates of the variance components (parameters: within-group sd = 1.5; between-school sd = 1). However, I had to change the seed from the one I used in the last email because, as is not uncommon with unbalanced data and relatively small sample sizes, there were convergence problems. If you were using classical MoM estimators fitting a mixed model with unbalanced data would lead to very precarious inferences (not that the likelihood inference isn't without it share of pitfalls, but that's another email...) hth, Kingsford On Wed, Oct 21, 2009 at 11:13 AM, Peter Flom peterflomconsult...@mindspring.com wrote: Hello Running R2.9.2 on Windows XP I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. Thus, m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum) causes an error in na.fail.default, but adding na.action = na.omit makes a model with no errors. However, if I create that model, i.e. m1.mod1 - lme(fixed = math_1 ~ I(year-2007.5)*TFC_, data = long, random = ~I(year-2007.5)|schoolnum, na.action = na.omit) then the diagnostic plots suggested in Pinheiro Bates produce errors; e.g. plot(m1.mod1, schoolnum~resid(.), abline = 0) gives an error could not find function NaAct. Searching the archives showed a similar question from 2007, but did not show any responses. Thanks for any help Peter ) Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop vs. apply(): strange behavior with data frame?
try running Rprof on the two examples to see what the difference is. what you will probably see is a lot of the time on the dataframe is spent in accessing it like a matrix ('['). Rprof is very helpful to see where time is spent in your scripts. Sent from my iPhone On Oct 21, 2009, at 17:17, Roberto Perdisci roberto.perdi...@gmail.com wrote: Hi everybody, I noticed a strange behavior when using loops versus apply() on a data frame. The example below explicitly computes a distance matrix given a dataset. When the dataset is a matrix, everything works fine. But when the dataset is a data.frame, the dist.for function written using nested loops will take a lot longer than the dist.apply USING FOR ### dist.for - function(data) { d - matrix(0,nrow=nrow(data),ncol=nrow(data)) n - ncol(data) r - nrow(data) for(i in 1:r) { for(j in 1:r) { d[i,j] - sum(abs(data[i,]-data[j,]))/n } } return(as.dist(d)) } USING APPLY ### f - function(data.row,data.rest) { r2 - as.double(apply(data.rest,1,g,data.row)) } g - function(row2,row1) { return(sum(abs(row1-row2))/length(row1)) } dist.apply - function(data) { d - apply(data,1,f,data) return(as.dist(d)) } TESTING ### library(mvtnorm) data - rmvnorm(100,mean=seq(1,10),sigma=diag(1,nrow=10,ncol=10)) tf - system.time(df - dist.for(data)) ta - system.time(da - dist.apply(data)) print(paste('diff = ',sum(as.matrix(df) - as.matrix(da print(tf = ) print(tf) print(ta = ) print(ta) print('--') print('Same experiment on data.frame...') data2 - as.data.frame(data) tf - system.time(df - dist.for(data2)) ta - system.time(da - dist.apply(data2)) print(paste('diff = ',sum(as.matrix(df) - as.matrix(da print(tf = ) print(tf) print(ta = ) print(ta) Here is the output I get on my system (R version 2.7.1 on a Debian lenny) [1] diff = 0 [1] tf = user system elapsed 0.088 0.000 0.087 [1] ta = user system elapsed 0.128 0.000 0.128 [1] -- [1] Same experiment on data.frame... [1] diff = 0 [1] tf = user system elapsed 35.031 0.000 35.029 [1] ta = user system elapsed 0.184 0.000 0.185 Could you explain why that happens? thank you, regards Roberto __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing data and LME models and diagnostic plots
Hi Peter, See e.g. Hedeker and Gibbons, Longitudinal Data Analysis, which repeatedly stresses that mixed models provide good estimates if the data are missing at random. This may be true. However, one of the real strengths of LME is that it handles unbalanced designs, which is a different thing. The fact that it also gets good estimates when data are missing is a bonus. Regards, Mark. Peter Flom-2 wrote: I wrote I am puzzled by the performance of LME in situations where there are missing data. As I understand it, one of the strengths of this sort of model is how well it deals with missing data, yet lme requires nonmissing data. Mark Difford replied You are confusing missing data with an unbalanced design. A strengths of LME is that it copes with the latter, which aov() does not. Thanks for your reply, but I don't believe I am confused with respect to missing data in mixed models. See e.g. Hedeker and Gibbons, Longitudinal Data Analysis, which repeatedly stresses that mixed models provide good estimates if the data are missing at random. Regards Peter Peter L. Flom, PhD Statistical Consultant Website: www DOT peterflomconsulting DOT com Writing; http://www.associatedcontent.com/user/582880/peter_flom.html Twitter: @peterflom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Missing-data-and-LME-models-and-diagnostic-plots-tp25996584p26004465.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.