Re: [R] Version control (git, mercurial) for R packages
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 CCd to r-devel as suggested by Peter. On 09/02/12 19:28, Hadley Wickham wrote: I'm exploring using a version control system to keep better track of changes to the packages I maintain. I'm leaning towards git (although mercurial also looks good) but am not sure what is the best way to set up the repository. It seems I can't set the repository directly within the R package main directory, since it will be incompatible with the standard package structure. I can set the repository in the directory that contains my R packages, but then I have a single repository for all of my packages, which is not ideal. Git is nice - but if you ar looking for simplicity in usage for R packages, I guess r-forge and rforge are the easiest to use. I think git + github is substantially easier to use, especially if you want to incorporate patches from the community. But I would be interested in the workflow when using github as the main VCS. The thing that has made this really successful for me is the install_github function in devtools. Then it's easy for people to try out development versions: install.packages(devtools) library(devtools) install_github(myrepo, myusername) Thanks Hadley - I should definitely look closer into the devtools package - there really seem to be some gems in there. One example is this function - sounds perfect for small scale usages and for developers. But what I was thinking about (in my other post as well) is to include github (or any other git repo provider) into r-forge for the automatic creation of packages. So is there an easy way to kind of push a certain revision up to r-forge to have the automatic builds? One could do it manualy, but automatic would be so much nicer. This requires a working R development environment but thanks to the hard work of Simon Urbanek, Duncan Murdoch, Brian Ripley and others, this is a one-install process on all major platforms. True - no problem for developers. Cheers, Rainer Hadley - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk800TcACgkQoYgNqgF2ego0QgCfcDIpUlxmVHfYMHi/IlCNKSJ3 1sIAnjQEmBYoTXIE5SlvRUqs/eAioibC =38uA -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing a CSV file
On Feb 10, 2012, at 04:20 , sezgin ozcan wrote: I use mac. I tried this command also a-read.table(clipboard,sep=”\t”,row.names=1,header=T) Something is wrong with those quotes around \t... Also, last I needed this, using file=clipboard to read from the clipboard didn't work on Mac, but file=pipe(pbpaste) did. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Custom caret metric based on prob-predictions/rankings
Actually, is there any way to get at additional information beyond the classProbs? In particular, is there any way to find out the associated weights, or otherwise the row indices into the original model matrix corresponding to the tested instances? On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote: Oops, found trainControl's classProbs right after I sent! On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote: I'm dealing with classification problems, and I'm trying to specify a custom scoring metric (recall@p, ROC, etc.) that depends on not just the class output but the probability estimates, so that caret::train can choose the optimal tuning parameters based on this metric. However, when I supply a trainControl summaryFunction, the data given to it contains only class predictions, so the only metrics possible are things like accuracy, kappa, etc. Is there any way to do this that I'm looking? If not, could I put this in as a feature request? Thanks! -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding all the coefficients for a logit model
On Feb 10, 2012, at 06:08 , R. Michael Weylandt wrote: Add -1 to your glm to remove the intercept term -- that will force Friday to have its own coefficient. That's one answer. Another one is that the coefficient (in the model with an intercept term) for Friday is zero with an s.e. of zero. Namely, by definition, the log-odds for Friday minus the log-odds for Friday. -pd E.g., dg - data.frame(value = rnorm(28), day = letters[1:7]) lm(value ~ day, data = dg) lm(value ~ day - 1, data = dg) Michael On Thu, Feb 9, 2012 at 6:58 PM, Abraham Mathew abmathe...@gmail.com wrote: Let's say I have a variable, day, which is saved as a factor with 7 levels, and I use it in a logistic regression model. I ran the model using the car package in R and printed out the results. mod1 = glm(factor(status1) ~ factor(day), data=mydat, family=binomial(link=logit)) print(summary(mod1)) The result I get is: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.4350 0.0379 -11.48 2e-16 *** factor(day)Monday -0.6072 0.0479 -12.69 2e-16 *** factor(day)Saturday0.5964 0.0559 10.67 2e-16 *** factor(day)Sunday 1.1140 0.0627 17.78 2e-16 *** factor(day)Thursday -0.4492 0.0516 -8.71 2e-16 *** factor(day)Tuesday-0.9331 0.0496 -18.82 2e-16 *** factor(day)Wednesday -0.8575 0.0486 -17.63 2e-16 *** It seems that Friday is being used as the baseline, but I want to know how I can acquire the coefficient for the baseline (friday)? I ran mod1$coefficients, but that didn't do the trick. Can anyone help. Thank you, Abraham M. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recall@p plot using ROCR?
Is it possible to use ROCR to plot a simple recall@p plot? I.e., a plot where the x-axis is the position into the ranked test set, and the y-axis is the recall, so you can see what's the recall in the top 10% of the ranked results. I searched through the performance() manual but found nothing. (Not a cutoff-vs-recall graph, since the cutoff is the probability estimate returned by the model.) Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot using scale_x_date gives Error in seq.int(r1$year, to$year, by)
Hi Aidan, str is your friend: str(g) 'data.frame': 9 obs. of 3 variables: $ Date: chr 2011-12-23 2011-12-30 2012-01-06 2011-12-23 ... $ variable: Factor w/ 3 levels Price,Yield,..: 1 1 1 2 2 2 3 3 3 $ value : num 86.78 86.04 86.44 9.74 9.54 ... You haven't turned the Date variable into an actual date yet. Try: g$Date - as.Date(g$Date) Hadley On Fri, Jan 6, 2012 at 9:12 AM, Aidan Corcoran aidan.corcora...@gmail.com wrote: Dear all, ggplot gives me an error when trying to plot time series data using a date variable as the x axis. g-structure(list(Date = c(2011-12-23, 2011-12-30, 2012-01-06, 2011-12-23, 2011-12-30, 2012-01-06, 2011-12-23, 2011-12-30, 2012-01-06), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c(Price, Yield, CDS Spread), class = factor), value = c(86.777, 86.037, 86.437, 9.737, 9.542, 9.683, 580.132, 576.866, 573.564)), .Names = c(Date, variable, value ), row.names = c(100L, 101L, 102L, 202L, 203L, 204L, 304L, 305L, 306L), class = data.frame) g Date variable value 100 2011-12-23 Price 86.8 101 2011-12-30 Price 86.0 102 2012-01-06 Price 86.4 202 2011-12-23 Yield 9.7 203 2011-12-30 Yield 9.5 204 2012-01-06 Yield 9.7 304 2011-12-23 CDS Spread 580.1 305 2011-12-30 CDS Spread 576.9 306 2012-01-06 CDS Spread 573.6 gp-ggplot(g,aes(x=Date,y=value,group=variable,lty=variable)) +geom_line() gp #THIS WORKS FINE (BUT AXIS LABELLING NOT VISIBLE WITH MORE DATA, HENCE I WOULD LIKE TO USE SCALE_X_DATE) gp-gp+ scale_x_date() gp Error in seq.int(r1$year, to$year, by) : 'from' must be finite In addition: Warning messages: 1: In get(x, envir = this, inherits = inh)(this, ...) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf 4: In min(x) : no non-missing arguments to min; returning Inf 5: In max(x) : no non-missing arguments to max; returning -Inf In previous help requests, the workaround of specifying the unit was suggested: gp-gp+ scale_x_date(major=years) but this doesn't work for me (same error). Any help is very much appreciated! Thanks in advance. Aidan R version 2.14.1 (2011-12-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Ireland.1252 LC_CTYPE=English_Ireland.1252 LC_MONETARY=English_Ireland.1252 [4] LC_NUMERIC=C LC_TIME=English_Ireland.1252 attached base packages: [1] datasets tools grDevices grid splines graphics stats tcltk utils [10] methods base other attached packages: [1] micEcon_0.6-6 miscTools_0.6-12 np_0.40-11 cubature_1.0 boot_1.3-3 [6] RODBC_1.3-3 sqldf_0.4-6.1 chron_2.3-41 gsubfn_0.5-7 DBI_0.2-5 [11] Haver_1.0 xtable_1.5-6 plm_1.2-7 sandwich_2.2-7 MASS_7.3-16 [16] Formula_1.0-1 nlme_3.1-102 bdsmatrix_1.0 RBloomberg_0.4-150 rJava_0.9-1 [21] gtools_2.6.2 gdata_2.8.2 ggplot2_0.8.9 proto_0.3-9.2 zoo_1.7-4 [26] reshape_0.8.4 plyr_1.6 svSocket_0.9-52 TinnR_1.0.3 R2HTML_2.2 [31] Hmisc_3.8-3 survival_2.36-10 loaded via a namespace (and not attached): [1] cluster_1.14.1 digest_0.5.0 lattice_0.20-0 RSQLite_0.10.0 [5] RSQLite.extfuns_0.0.1 svMisc_0.9-63 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GGplot controlling point size across range
On Thu, Jan 12, 2012 at 5:27 PM, Darran King darran.k...@csiro.au wrote: Hi all New to R and GGplot2 but loving the potential. I am trying to plot four separate point plots by looping over the data and plotting a different subset each time. When I plot the data as a point plot, the size of the points is determined by the data values used as below qplot(accum_rain, accum_g_radn, data = clim_sub[[i]], size = avgyld, colour = avgyld) The problem is that i want all four plots to be comparable, so a point size representing avgyld = 2000 should be the same on all four plots. However as the data for some plots has a smaller range than others and the plots are automatically scalling to the range of data in each plot, and the largest point is always assigned to the largest value a plot with a top value of say 5000 with be represented with the same size point as a plot with a top value of 7000. Any tips on how to scale the point sizes to a defined range of classes and still plot the actual data to those classes? Specify limits to scale_area: + scale_area(limits = c(0, 1000)) If you're still having problems, you might want to try the ggplot2 mailing list, as Jeff suggested. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pretty(range(data$z),10) Error
Hi Mario, If you're still having problems, I'd suggest sending a small reproducible example (https://github.com/hadley/devtools/wiki/Reproducibility) to the ggplot2 mailing list. Hadley On Tue, Jan 17, 2012 at 12:32 PM, Mario Giesel rr.gie...@yahoo.de wrote: Hello, R-List, I'm getting error messages when adding geom_density2d() [package ggplot2]: Fehler in pretty(range(data$z), 10) : NA/NaN/Inf in externem Funktionsaufruf (arg 1) Zusätzlich: Warnmeldungen: 1: Removed 1 rows containing missing values (stat_contour). 2: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück 3: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück 4: In min(x) : kein nicht-fehlendes Argument für min; gebe Inf zurück 5: In max(x) : kein nicht-fehlendes Argument für max; gebe -Inf zurück I installed 2.11.1 Patched in the hope to overcome this obstacle but in vain. I tried reading spss and csv format - no effect. While this one works: ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ segment,ncol=3) This one fails: ggplot(sp2, aes(x=Qq04, y=Qq01)) + geom_point() + facet_wrap(~ segment,ncol=3) + geom_density2d() But it only fails for 2 out of 10 x-variables, although they do not differ in structure. Data structure looks like this: segment Qq01 Qq02 Qq02a Qq04 Qq05 Qq07 Qq08 Qq10 Qq11 Qq15 Qq17a A 7 5 5 5 5 4 5 5 3 7 7 A 5 4 4 3 4 3 4 4 4 5 6 B 3 3 3 3 2 3 4 2 4 3 2 B 6 4 5 3 3 3 4 4 3 6 6 C 3 3 3 3 3 4 2 3 3 4 2 C 2 1 4 3 1 4 1 1 3 4 2 D 5 3 4 3 3 3 4 3 3 6 3 D 5 3 4 2 2 4 4 3 3 4 4 E 3 3 3 2 3 4 4 3 2 3 4 E 7 5 5 5 5 4 4 5 1 7 7 F 6 5 4 3 3 3 4 3 3 6 6 F 5 3 4 3 3 4 4 3 3 4 4 G 4 3 3 3 2 1 3 3 3 4 4 G 6 5 5 5 5 4 5 5 3 7 7 H 6 5 5 4 4 4 5 4 3 5 6 H 7 5 5 5 4 4 5 5 3 7 7 I 6 5 5 4 4 4 5 4 3 6 6 I 6 3 4 2 3 4 4 3 4 6 4 J 5 3 4 1 3 4 4 3 3 5 5 J 4 2 4 3 2 2 3 2 3 4 4 K 4 4 4 2 3 4 4 4 3 5 5 K 6 4 3 3 2 3 4 3 3 6 5 L 6 4 5 3 3 4 5 4 3 6 6 L 3 1 2 2 1 3 3 1 3 4 4 ... About 1.800 cases (more than 100 per segment). Did anybody experience similar problems? Thanks for all comments, Mario [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting bar graph over a geographical map
Hi Simon, You might want to try sending a small reproducible example (https://github.com/hadley/devtools/wiki/Reproducibility) to the ggplot2 mailing list. Hadley On Tue, Jan 31, 2012 at 10:53 PM, sjlabrie sjlab...@mit.edu wrote: Hi, I am looking for a way to plot bar on a map instead of the standard points. I have been using ggplot2 and maps libraries. The points are added with the function geom_point. I know that there is a function geom_bar but I can't figure out how to use it. Thank you for your help, Simon ### R-code library(ggplot2) library(maps) measurements - read.csv(all_podo.count.csv, header=T) allworld - map_data(world) pdf(map.pdf) ggplot(measurements, aes(long, lat)) + geom_polygon(data = allworld, aes(x = long, y = lat, group = group), colour = grey70, fill = grey70) + geom_point(aes(size = ref)) + opts(axis.title.x = theme_blank(), axis.title.y = theme_blank()) + geom_bar(aes(y = normcount)) dev.off() ### -- View this message in context: http://r.789695.n4.nabble.com/Plotting-bar-graph-over-a-geographical-map-tp4346925p4346925.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 geom_polygon fill
Hi Raimund, To increase your chances of getting help, I'd recommend using the ggplot2 mailing list, and reducing your example down to the essence of the problem. For example, the theme components don't affect the problem, but make the code longer, and so harder to understand. Hadley On Mon, Feb 6, 2012 at 10:50 AM, raimund rai...@gmail.com wrote: Hi everyone, i've been trying to make a special plot with ggplot2, but I can't get it to fill the polygon I'd like to see filled so very very much. I want to display the difference or change in the distribution of the modified Rankin Scale between two groups. mRS is a scale for disability or daily activities competence. It looks like this. http://r.789695.n4.nabble.com/file/n4361919/rankinplot.png I just wish the polygons in between the bars would fill in the same colors as the bar segments do. Interestingly, in the example provided by the geom_polygon help page, there is a fill, which leads me to suspect that something with my data frame might be wrong. If I supply a colour argument, I get borders, but not always in the color I defined. The attached image has such borders, to make clear what polygons I am talking about. In the final plot I don't desire such borders, only the as of yet unattainable fill. Here's my code: library(ggplot2) library(plyr) # define good looks no_margins - opts( axis.line = theme_blank(), axis.text.x = theme_blank(), axis.ticks = theme_blank(), axis.title.x = theme_text(size = 12, vjust = 0.15), axis.title.y = theme_text(angle = 90, size = 12, vjust = 0.2), axis.ticks.length = unit(0, cm), axis.ticks.margin = unit(0, cm), panel.background = theme_blank(), panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(), plot.background = theme_blank(), plot.title = theme_blank(), plot.margin = unit(c(1, 1, 1, 1.5), lines) ) sfm = scale_fill_manual(mRS, c(0=darkgreen, 1=forestgreen, 2=mediumseagreen, 3=coral, 4=red, 5=darkred, 6=black)) barwidth = 0.6 # good looks defined smalldummy = data.frame( mRS = as.factor(rep(0:6,2)), rsfreq = sample(0:6,14,replace=T), treatment = factor(rep(c(one,two),each=7)) ) smalldummy = ddply(smalldummy, .(treatment), transform, textpos = cumsum(rsfreq/sum(rsfreq)) - rsfreq/sum(rsfreq)/2, # center within segment lineposx = cumsum(rsfreq/sum(rsfreq))) # segment borders without 0 # make the xs of the polygon polylo = 1 + barwidth/2 polyhi = 2 - barwidth/2 xs = rep(c(polylo,polyhi,polyhi,polylo), 7) # make the ys of the polygon tmp1 = c(0, smalldummy$lineposx[1:7]) tmp2 = c(0, smalldummy$lineposx[8:14]) ys = c() for(i in 1:7) { nu = c(tmp1[i], tmp2[i], tmp2[i+1], tmp1[i+1]) ys = c(ys, nu) } m = as.factor(rep(0:6, each=4)) tmpdf = data.frame(xs, ys, mRS = m) bigdummy = merge(smalldummy, tmpdf, by = mRS) ggplot(data = bigdummy, aes(x = treatment, y = rsfreq, fill = mRS)) + geom_bar(aes(width = barwidth),stat=identity,position=fill) + geom_text(aes(label=as.character(mRS), y = ifelse(rsfreq != 0, textpos, NA)), size = 8, colour = white) + geom_polygon(aes(x = xs, y = ys, group = mRS, fill = mRS)) + ylab(Modified Rankin Scale) + xlab(Treatment) + coord_flip() + theme_bw() + opts(legend.position = none) + no_margins + sfm -- View this message in context: http://r.789695.n4.nabble.com/ggplot2-geom-polygon-fill-tp4361919p4361919.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple comparisons of lme model - interactions?
Dear list, Please excuse my ignorance, but I'm trying to model some data using the lme package. vot is a numeric response, and condition, location and obs are all categories. This works: anova(vot.lme - lme(vot ~ condition * location * obs,data=mergedCodesL,random= ~1 |patient)) numDF denDF F-value p-value (Intercept)1 1898 462.7519 .0001 condition 2 1898 8.4126 0.0002 location 112 0.0272 0.8718 obs2 1898 472.5526 .0001 condition:location 2 1898 1.0467 0.3513 condition:obs 4 1898 1.0683 0.3706 location:obs 2 1898 9.7067 0.0001 condition:location:obs 4 1898 4.6143 0.0010 If I then would like to do post-hoc testing, I found in the email archives that I could use the glht function of multcomp - something like summary(glht(vot.lme, linfct=mcp(obs = Tukey))) However, if I would like to investigate the condition:location:obs - interaction. What do I do then? Best! /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Custom caret metric based on prob-predictions/rankings
I think you need to read the man pages and the four vignettes. A lot of your questions have answers there. If you don't specify the resampling indices, they ones generated for you are saved in the train object: data(iris) TrainData - iris[,1:4] TrainClasses - iris[,5] knnFit1 - train(TrainData, TrainClasses, + method = knn, + preProcess = c(center, scale), + tuneLength = 10, + trControl = trainControl(method = cv)) Loading required package: class Attaching package: ‘class’ The following object(s) are masked from ‘package:reshape’: condense Warning message: executing %dopar% sequentially: no parallel backend registered str(knnFit1$control$index) List of 10 $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ... $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ... $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ... $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ... $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ... $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ... $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ... $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ... $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ... $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ... There is also a savePredictions argument that gives you the hold-out results. I'm not sure which weights you are referring to. On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang yanghates...@gmail.com wrote: Actually, is there any way to get at additional information beyond the classProbs? In particular, is there any way to find out the associated weights, or otherwise the row indices into the original model matrix corresponding to the tested instances? On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote: Oops, found trainControl's classProbs right after I sent! On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote: I'm dealing with classification problems, and I'm trying to specify a custom scoring metric (recall@p, ROC, etc.) that depends on not just the class output but the probability estimates, so that caret::train can choose the optimal tuning parameters based on this metric. However, when I supply a trainControl summaryFunction, the data given to it contains only class predictions, so the only metrics possible are things like accuracy, kappa, etc. Is there any way to do this that I'm looking? If not, could I put this in as a feature request? Thanks! -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-question about html writer: hwriter; how to create scroll for matrices?
Dear All This is the first time I use this help forum. My apologies in case I did not follow a predefined format. My question: I use the package hwriter. How can I produce a matrix (html) that allows scrolling (up and down, left and right) or, alternatively, prduces a fixed row for the headers and a fixed column for the row names (so that the names do not disappear if I go to the right or downward on that webpage)? I did try quite few things (also via CSS) but I seem to have problems referring to the entire matrix rather than a single item of it. For example: assume we have a correlation matrix of some dimension (assume 100x100). What would be an efficient way to do the above? Thank you to anyone who has a go at this. Kind Regards Hanspeter This message may contain confidential information and is...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] making multiple lines using qqplot
Melissa, par(new=T) works as many times as you use it. You don't provide data, but (assuming it is not NULL) more likely your n=500 qqplot was just obscuring the points of the n=50 plot. Reverse the order (i.e. qqplot 500 first, 50, 5 last) and see if all three are there (as there are more 500 you should still see green on the extremes). Second, you say you want lines but used pch=20. replace with type='l' to get lines (may also help with your main problem). If you want to stick with points and your device supports it, you can consider semi-transparent colors. Cheers On Thu, Feb 9, 2012 at 9:00 PM, Melrose2012 melissa.patric...@stonybrook.edu wrote: Hi Everyone, I want to make 3 lines on the same graph (not as subplots, all in the same window, one on top of each other) and I want them to be quantile-quantile plots (qqplot). Essentially, I am looking for the equivalent of Matlab's hold on command to use with qqplot. I know I can use 'points' or 'lines', but these do not give me a qqplot (only appear to work as scatter plots). I found the syntax 'par(new=TRUE)' but that only seems to work for two lines, not for three. My script currently looks like: qqplot(nq.n5,tq.n5,col=red,xlab=Normal Distribution Quantiles,ylab=t Distribution Quantiles,main=Quantile-Quantile Plot of Normal vs t-Distribution for Various Sample Sizes,pch=20) par(new=TRUE) qqplot(nq.n50,tq.n50,col=blue,xlab=,ylab=,pch=20). par(new=TRUE) qqplot(nq.n500,tq.n500,col=green,xlab=,ylab=,pch=20) legend(topleft,c(n=5,n=50,n=500),fill=c(red,blue,green)) I realize that this only plots the first and the third qqplot because by doing par(new=TRUE) again, it gets rid of the middle one. I don't know how to get around this and get all 3 lines on the same plot. Can anyone please help me with this syntax? Thank you very much for your time and advice! Cheers, Melissa -- View this message in context: http://r.789695.n4.nabble.com/making-multiple-lines-using-qqplot-tp4375273p4375273.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot using scale_x_date gives Error in seq.int(r1$year, to$year, by)
Thanks Hadley, of course I should have spotted that. On Fri, Feb 10, 2012 at 12:46 PM, Hadley Wickham had...@rice.edu wrote: Hi Aidan, str is your friend: str(g) 'data.frame': 9 obs. of 3 variables: $ Date : chr 2011-12-23 2011-12-30 2012-01-06 2011-12-23 ... $ variable: Factor w/ 3 levels Price,Yield,..: 1 1 1 2 2 2 3 3 3 $ value : num 86.78 86.04 86.44 9.74 9.54 ... You haven't turned the Date variable into an actual date yet. Try: g$Date - as.Date(g$Date) Hadley On Fri, Jan 6, 2012 at 9:12 AM, Aidan Corcoran aidan.corcora...@gmail.com wrote: Dear all, ggplot gives me an error when trying to plot time series data using a date variable as the x axis. g-structure(list(Date = c(2011-12-23, 2011-12-30, 2012-01-06, 2011-12-23, 2011-12-30, 2012-01-06, 2011-12-23, 2011-12-30, 2012-01-06), variable = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c(Price, Yield, CDS Spread), class = factor), value = c(86.777, 86.037, 86.437, 9.737, 9.542, 9.683, 580.132, 576.866, 573.564)), .Names = c(Date, variable, value ), row.names = c(100L, 101L, 102L, 202L, 203L, 204L, 304L, 305L, 306L), class = data.frame) g Date variable value 100 2011-12-23 Price 86.8 101 2011-12-30 Price 86.0 102 2012-01-06 Price 86.4 202 2011-12-23 Yield 9.7 203 2011-12-30 Yield 9.5 204 2012-01-06 Yield 9.7 304 2011-12-23 CDS Spread 580.1 305 2011-12-30 CDS Spread 576.9 306 2012-01-06 CDS Spread 573.6 gp-ggplot(g,aes(x=Date,y=value,group=variable,lty=variable)) +geom_line() gp #THIS WORKS FINE (BUT AXIS LABELLING NOT VISIBLE WITH MORE DATA, HENCE I WOULD LIKE TO USE SCALE_X_DATE) gp-gp+ scale_x_date() gp Error in seq.int(r1$year, to$year, by) : 'from' must be finite In addition: Warning messages: 1: In get(x, envir = this, inherits = inh)(this, ...) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf 4: In min(x) : no non-missing arguments to min; returning Inf 5: In max(x) : no non-missing arguments to max; returning -Inf In previous help requests, the workaround of specifying the unit was suggested: gp-gp+ scale_x_date(major=years) but this doesn't work for me (same error). Any help is very much appreciated! Thanks in advance. Aidan R version 2.14.1 (2011-12-22) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Ireland.1252 LC_CTYPE=English_Ireland.1252 LC_MONETARY=English_Ireland.1252 [4] LC_NUMERIC=C LC_TIME=English_Ireland.1252 attached base packages: [1] datasets tools grDevices grid splines graphics stats tcltk utils [10] methods base other attached packages: [1] micEcon_0.6-6 miscTools_0.6-12 np_0.40-11 cubature_1.0 boot_1.3-3 [6] RODBC_1.3-3 sqldf_0.4-6.1 chron_2.3-41 gsubfn_0.5-7 DBI_0.2-5 [11] Haver_1.0 xtable_1.5-6 plm_1.2-7 sandwich_2.2-7 MASS_7.3-16 [16] Formula_1.0-1 nlme_3.1-102 bdsmatrix_1.0 RBloomberg_0.4-150 rJava_0.9-1 [21] gtools_2.6.2 gdata_2.8.2 ggplot2_0.8.9 proto_0.3-9.2 zoo_1.7-4 [26] reshape_0.8.4 plyr_1.6 svSocket_0.9-52 TinnR_1.0.3 R2HTML_2.2 [31] Hmisc_3.8-3 survival_2.36-10 loaded via a namespace (and not attached): [1] cluster_1.14.1 digest_0.5.0 lattice_0.20-0 RSQLite_0.10.0 [5] RSQLite.extfuns_0.0.1 svMisc_0.9-63 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxme with frailty
A couple of clarifications for you. 1. I write mixed effects Cox models as exp(X beta + Z b), beta = fixed effects coefficients and b = random effects coefficients. I'm using notation that is common in linear mixed effects models (on purpose). About 2/3 of the papers use exp(X beta)* c, i.e., pull the random effects out of the exponent. Does it make a difference? Not much: b will be Gaussian and c will be log-normal. However, it's much easier to write down complicated (crossed, nested, ect) random effects in the notation I chose. The description you give from HL is using the c style. 2. If var(b) is known, solution of beta and b is straightforward. It's a small modification of the ususal coxph internals for a fixed effects model. That is, a bit of c code that has been optimized and exercised over a decade, it makes sense to re-use it. So all the routines I know of solve mixed effects in a nested fashion: an outer maximizer looks for the parameters of var(b), and for each trial value calls the known routine (which itself iterates) for the corresponding (beta, b) pair. The HL formula you quote only holds in a specific case. Terry T begin included message -- Thank you very much for taking the time to answer. This pointed me in the right direction. For those interested, I found a useful explanation of the derivation of the estimated variance of the random effect in Hosmer Lemeshow's Applied Survival Analysis (1999), pp.321-26 (I love your book, Dr. Therneay, but I needed something easier...). They proceed in 4 steps: 1. Obtain the cumulative hazard function for each subject. 2. Choose an arbitrary value for the variance parameter of the frailties (call it theta). 3. Compute for each subject an estimate of the value of their frailties, USING this variance parameter theta: frailty_i= \frac(1+\theta \times c_i}{1+\theta \times H_i} (formula on p. 321), where H is the cumulative hazard for the subject. So if theta is 0 (no variance), then frailty=1 (i.e., no excess frailty). As theta goes to infinity, the estimated frailty is simply the ratio 1/(cumulative risk so far) or 1/(cumulative risk so far), depending on whether the subject is still alive or not. 4. fit the proportional hazard model with the same covariates as before, but this time including the frailties in the hazard function. 5. calculate a log-likelihood for the model. Repeat this for all possible values of the frailties (in practice, sweep through them according to some algorithm), and pick the one with the highest log-likelihood. SO IF I understand correctly, the frailties are calculated GIVEN a variance parameter of the frailties, and not the reverse. I.e., theta is chosen first, and the random effects are estimated accordingly. Which is why the variance of the estimated random effects is NOT the same as theta. Did I get this right? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find interval between numbers in list or vector
hi, I have a vector as follow: myVec = c(1,4,10,30) How can I get a vector that show the interval of the numbers. I need to get this result: distance 3,6,20 -- View this message in context: http://r.789695.n4.nabble.com/Find-interval-between-numbers-in-list-or-vector-tp4376115p4376115.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help needed please
I have coded a time series from simulated data: simtimeseries - arima.sim(n=1024,list(order=c(4,0,0),ar=c(2.7607, -3.8106, 2.6535, -0.9258),sd=sqrt(1))) #show roots are outside unit circle plot.ts(simtimeseries, xlab=, ylab=, main=Time Series of Simulated Data) # Yule q1 - cbind(simtimeseries[1:1024]) q2 - t(q1)%*%q1 s0 - q2/1204 r1 - cbind(simtimeseries[1:1023]) r2 - cbind(simtimeseries[2:1024]) r3 - t(r1)%*%r2 s1 - r3/1204 t1 - cbind(simtimeseries[1:1022]) t2 - cbind(simtimeseries[3:1024]) t3 - t(t1)%*%t2 s2 - t3/1204 u1 - cbind(simtimeseries[1:1021]) u2 - cbind(simtimeseries[4:1024]) u3 - t(u1)%*%u2 s3 - u3/1204 v1 - cbind(simtimeseries[1:1020]) v2 - cbind(simtimeseries[5:1024]) v3 - t(v1)%*%v2 s4 - v3/1204 i0 - c(s0,s1,s2,s3) i1 - c(s1,s0,s1,s2) i2 - c(s2,s1,s0,s1) i3 - c(s3,s2,s1,s0) gamma - cbind(i0,i1,i2,i3) eta -c(s1,s2,s3,s4) inversegamma - solve(gamma) phihat - inversegamma%*%eta phihat Phihat - cbind(phihat) s - c(s1,s2,s3,s4) S - cbind(s) sigmasquaredyule - s0 - (t(Phihat)%*%S) sigmasquaredyule I did a yule walker estimate on the simulated data and wanted to work out phi hat which is a vector of 4 values and sigmasquaredyule which is one value. However, I want to run the simulated data 100 times i.e. in a for loop and then take the averages of the phi hat values and sigmasquaredyule value. How would i repeat this simulated time series lots of times (e.g. a 100 times) and store the average value of phi hat and sigmasquaredyule. Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find interval between numbers in list or vector
Have a look at function diff(), e.g., diff(c(1,4,10,30)) I hope it helps. Best, Dimitris On 2/10/2012 1:33 PM, alend wrote: hi, I have a vector as follow: myVec = c(1,4,10,30) How can I get a vector that show the interval of the numbers. I need to get this result: distance 3,6,20 -- View this message in context: http://r.789695.n4.nabble.com/Find-interval-between-numbers-in-list-or-vector-tp4376115p4376115.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Q - scatterplots
I was able to make a scatterplot but ... 1) what does the 86 mean? The 86 shows up on the graph as well. scatterplot (Shells/TotalEggs ~ Sector, data = data.to.analyze) [1] 86 2) Also how do you change the Y axis title? I don't want it to read Shells/TotalEggs, instead I would like it to read Average Hatching Rate (%). 3) What does this error mean? Rayos if composed of section 1, 2, 3, 4 and 5 scatterplot (Shells/TotalEggs ~ Rayos, data = data.to.analyze) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf J -- View this message in context: http://r.789695.n4.nabble.com/Q-scatterplots-tp4375632p4375632.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Split matrix into square submatices
Hi everybody, I'm looking for an optimal way to split a big matrix (e.g. ncol = 8, nrow=8) into small square submatrices (e.g. ncol=2, nrow=2) For example If I have h [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]19 17 25 33 41 49 57 [2,]2 10 18 26 34 42 50 58 [3,]3 11 19 27 35 43 51 59 [4,]4 12 20 28 36 44 52 60 [5,]5 13 21 29 37 45 53 61 [6,]6 14 22 30 38 46 54 62 [7,]7 15 23 31 39 47 55 63 [8,]8 16 24 32 40 48 56 64 and I want to split matriz h into 16 small submatrices: g[1] [,1] [,2] [1,]19 [2,]2 10 g[2] [,1] [,2] [1,] 17 25 [2,] 18 26 ... g[4] [,1] [,2] [1,] 49 57 [2,] 50 58 ... g[16] [,1] [,2] [1,] 55 63 [2,] 56 64 Always the big matrix would be a square matrix and also would have a nrow and ncol in order of a power of two, so it always could be splitted in at least halves. Until now, I'm able to split a matrix but keeping the number of original matrix columns. I mean, if I want to split into 16 submatrices, using the following command what I get its 4 lists: But I haven't found a way to split resultant lists in order to get 4 matrices for each list. g - split(as.data.frame(h), (1:4)) g $`1` V1 V2 V3 V4 V5 V6 V7 V8 1 1 9 17 25 33 41 49 57 5 5 13 21 29 37 45 53 61 $`2` V1 V2 V3 V4 V5 V6 V7 V8 2 2 10 18 26 34 42 50 58 6 6 14 22 30 38 46 54 62 $`3` V1 V2 V3 V4 V5 V6 V7 V8 3 3 11 19 27 35 43 51 59 7 7 15 23 31 39 47 55 63 $`4` V1 V2 V3 V4 V5 V6 V7 V8 4 4 12 20 28 36 44 52 60 8 8 16 24 32 40 48 56 64 I would appreciate any hint. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Split-matrix-into-square-submatices-tp4375563p4375563.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rpart and splitting criteria
Dear All, I have questions about the function rpart to construct a regression tree in R code. My problem is how to change the splitting criteria. In the rpart we have : parms=list(split=..) , I ask you if in this command is it possible to use an another splitting criterion to substitute the default criteria( gini or information)? Does someone can help me ? Thank you, Myriam Tabasso [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to made .exe of any gui project in R
i made jDialog through JGR package(Java gui 4 R), i want to export it with imported external package e.g.bio3d,RCOR , into .exe or if not possible convert code to executable jar file how should i go, give me package name,example codes (if possible) -- View this message in context: http://r.789695.n4.nabble.com/how-to-made-exe-of-any-gui-project-in-R-tp4375938p4375938.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] access results of graph.maxflow and program max-flow algorithm
Hi all, I want to use graph.maxflow function for my work. Given a network, I could get a result, but I could not figure out how the function get the result. Did anyone have the same problem before? How did you resolve it? Does anyone know any other functions/programs for max-flow analysis. Thank you in advance. Wendy -- View this message in context: http://r.789695.n4.nabble.com/access-results-of-graph-maxflow-and-program-max-flow-algorithm-tp4375744p4375744.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] clustering and the region of integration
Dear R users, I'm having trouble with calculating pvalues for my 2d dataset. First I performed clustering and I would like to get some info about the strength of cluster membership for each point. I've calculated (thanks to nice people help) the multivariate normal densities (mnd) using dmvnorm function: pd11=mvtnorm::dmvnorm(dataset1,mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1]) I've obtained a vector of mnds for each cluster: NA12043 NA12249 NA12264 NA12707 NA12716 NA12717 NA12751 NA12762 NA12864 NA12873 NA07034 NA07048 NA07055 NA07345 NA07348 NA07357 NA10830 NA10835 8.627681e+00 8.465797e+00 1.522724e+01 2.047262e+01 1.780368e+01 2.443946e+01 8.687642e+00 5.024366e+00 2.163811e+01 6.093326e-01 1.503374e+00 2.263341e+00 2.177880e+01 2.851877e+01 1.240402e+01 7.498245e+00 1.186389e+01 1.229760e+01 NA12154 NA12234 NA12236 NA12763 NA12801 NA12812 NA12813 NA12878 NA10851 NA10854 NA10857 NA10859 NA10861 NA10863 NA11839 NA11840 NA11881 NA11882 8.293616e+00 4.019101e-19 2.733848e+01 2.623284e+01 2.320810e+01 5.112927e-01 1.432336e+01 1.000314e+01 1.675454e+01 8.239816e+00 2.449679e+01 2.655419e+01 2.294064e+01 2.218329e-17 8.844933e+00 2.911991e+00 2.170381e+01 3.089883e+00 NA11994 NA12044 NA12056 NA12057 NA12891 NA12892 1.668749e+01 1.588963e+01 5.913443e+00 2.924297e+01 1.765777e+01 7.935129e+00 Next, what I would like to do is to calculate the pvalue for each point, which was assigned to particular cluster. In order to do this i'm using pmvnorm function, but I found it difficult to set the region of integration. As I understand to get the probability of cluster membership I should define how 'far' from the cluster mean is my point. However, I've got 2d dataset and my mean is also 2d: [1,] [2,] 1.348992 1.269590 but I've got only one density value for each point. Using: pmvnorm(mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1], lower=2.218329e-17, upper=as.vector(dataset1MC$parameters$mean[,1])) gives strange results, since for 2.218329e-17 the output is: [1] 0.348126 attr(,error) [1] 1e-15 attr(,msg) [1] Normal Completion and pmvnorm(mean=dataset1MC$parameters$mean[,1],sigma=dataset1MC$parameters$variance$sigma[,,1], lower= as.vector(dataset1MC$parameters$mean[,1]) , upper=2.924297e+01) gives: [1] 0.348126 attr(,error) [1] 1e-15 attr(,msg) [1] Normal Completion If it is possible I would like to get some info about: Is my idea of calculating the probability of cluster membership is correct? How I can set properly the region of integration? I would be grateful for any help. Best Wishes, Bas. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trust in a glm.nb model results with an itereation limit reached
Hello to everyone. I'm fitting a glm.nb model to a count data. I'm using about 8 predictive variables. Once a run the script I do get a result but it tells me that the iteration limit has been reached. So, can i trust the results given by the model? Could it be a multicollinearity problem? Thank you for taking the time to help me. Greetings Lucas. -- View this message in context: http://r.789695.n4.nabble.com/Trust-in-a-glm-nb-model-results-with-an-itereation-limit-reached-tp4376073p4376073.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need to aggregate large dataset by week...
Hi all, I have a large dataset with ~8600 observations that I want to compress to weekly means. There are 9 variables (columns), and I have already added a week column with 51 weeks. I have been looking at the functions: aggregate, tapply, apply, etc. and I am just not savvy enough with R to figure this out on my own, though I'm sure it's fairly easy. I also have the Dates (month/day/year) for all of the observations, but I figured just having a week column may be easier. If someone wanted to show me how to organize this data using a date function and aggregating by month that would be useful too! Here's an example of the data set, with only 5 of the variables and 10 of 8600 obs.: weekrainfall windspeed winddir temp oakdepth 1 1 0.2000 0.89000 245.9200 1.15 4.40 2 1 0. 0.84000 292.8800 1.19 5.30 3 1 0.2000 0.74000 258.5400 1.36 6.00 4 1 0. 0.930003.7000 1.43 4.40 5 1 0.2000 0.69000 37.8200 1.56 5.20 6 1 0. 0.8 17.2900 1.69 4.40 7 1 0.2000 0.7 28.7300 1.88 5.00 8 1 0.2000 1.12000 294.3700 1.93 6.00 9 1 0. 1.21000 274.9700 1.80 4.40 10 1 0. 1.31000 279.2400 1.86 5.80 ...so after about 170 observations it changes to week 2, and so on. I've tried something like this, but its only one variable's mean, and I would rather have the rows=weeks and columns= the different variables. tapply(metdata$rainfall,metdata$week,FUN=mean) 1 2 3 4 5 6 0.080952381 0.101190476 0.379761905 0.179761905 0.0 0.295238095 7 8 9 10 11 12 0.146428571 0.015476190 0.16389 0.098809524 0.065476190 0.215476190 Hope this is enough information and that I'm not just re-asking an old question. Thanks so much in advance for any help. -- View this message in context: http://r.789695.n4.nabble.com/Need-to-aggregate-large-dataset-by-week-tp4376154p4376154.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] best option for big 3D arrays?
Hi all, I am trying to fill a 904x904x904 array, but at some point of the loop R states that the 5.5Gb sized vector is too big to allocate. I have looked at packages such as bigmemory, but I need help to decide which is the best way to store such an object. It would be perfect to store it in this cube form (for indexing and computation purpouses). If not possible, maybe the best is to store the 904 matrices separately and read them individually when needed? Never dealed with such a big dataset, so any help will be appreciated (R+ESS, Debian 64bit, 4Gb RAM, 4core) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] e1071/svm: never finishes
When I tried to run svm on the same data frame, memory usage as reported by top(1) doubled to 4GB almost right away and the function never returned (has been running for ~15 hours now). ^C does not stop it. This is most unusual, libsvm has always seemed very fast. This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu). * Sam Steingold f...@tah.bet [2012-02-09 21:43:30 -0500]: I did this: nb - naiveBayes(users, platform) pl - predict(nb,users) nrow(users) == 314781 ncol(users) == 109 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow (tens of minutes). why? 2. the predict results were completely off the mark (quite the opposite of the expected overfitting). suffice it to show the tables: pl: android blackberry ipad iphone lg linuxmac 3 5 11 14 312723 5 11 mobile nokiasamsungsymbianunknownwindows 1864 17 16112 0 0 platform: android blackberry ipad iphone lg linuxmac 18013 1221 2647 1328 4 2936 34336 mobile nokiasamsungsymbianunknownwindows 18 88 39103 2660 251388 i.e., nb classified nearly everything as lg while in the actual data lg is virtually nonexistent. 3. when I print nb, I see A-priori probabilities (which are what I expected) and Conditional probabilities which are confusing because there are only two of them, e.g.: android0.048464998 0.43946764 blackberry 0.001638002 0.04045564 ipad 0.322251606 1.84940588 iphone 0.030873494 0.23250250 lg 0.0 0. linux 0.023501362 0.34698919 mac0.082653774 1.22535027 mobile 0.0 0. nokia 0.0 0. samsung0.0 0. symbian0.0 0. unknown0.003759398 0.08219078 windows0.021158528 0.32916970 the predictors are integers. is the first column for the 0 predictors and the second for all non-0? Is there a way to ask naiveBayes to differenciate between non-0 values? thanks! -- Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000 http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il http://jihadwatch.org http://camera.org http://www.memritv.org Don't ascribe to malice what can be adequately explained by stupidity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to combine two matrix or vectors
such as a=[1,2,3 ],b =[2,4] I want to get a new one [1 2 3 4 5]. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/How-to-combine-two-matrix-or-vectors-tp4376467p4376467.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to combine two matrix or vectors
How could you combine a and b as given to get your new one? It doesn't have the same elements... ?union ?intersect ?rbind ?c Michael On Fri, Feb 10, 2012 at 9:55 AM, summer summerwind...@hotmail.com wrote: such as a=[1,2,3 ],b =[2,4] I want to get a new one [1 2 3 4 5]. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/How-to-combine-two-matrix-or-vectors-tp4376467p4376467.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split matrix into square submatices
On Fri, Feb 10, 2012 at 12:02:25AM -0800, xiddw wrote: Hi everybody, I'm looking for an optimal way to split a big matrix (e.g. ncol = 8, nrow=8) into small square submatrices (e.g. ncol=2, nrow=2) For example If I have h [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,]19 17 25 33 41 49 57 [2,]2 10 18 26 34 42 50 58 [3,]3 11 19 27 35 43 51 59 [4,]4 12 20 28 36 44 52 60 [5,]5 13 21 29 37 45 53 61 [6,]6 14 22 30 38 46 54 62 [7,]7 15 23 31 39 47 55 63 [8,]8 16 24 32 40 48 56 64 and I want to split matriz h into 16 small submatrices: g[1] [,1] [,2] [1,]19 [2,]2 10 g[2] [,1] [,2] [1,] 17 25 [2,] 18 26 ... g[4] [,1] [,2] [1,] 49 57 [2,] 50 58 ... g[16] [,1] [,2] [1,] 55 63 [2,] 56 64 Always the big matrix would be a square matrix and also would have a nrow and ncol in order of a power of two, so it always could be splitted in at least halves. Until now, I'm able to split a matrix but keeping the number of original matrix columns. I mean, if I want to split into 16 submatrices, using the following command what I get its 4 lists: But I haven't found a way to split resultant lists in order to get 4 matrices for each list. Hi. Try the following. k - 3 n - 2^k m - 2^(k - 2) a - matrix(1:n^2, nrow=n, ncol=n) g - vector(list, length=16) k - 1 for (j in 0:3) { for (i in 0:3) { g[[k]] - a[m*i + 1:m, m*j + 1:m] k - k + 1 } } g[1:5] [[1]] [,1] [,2] [1,]19 [2,]2 10 [[2]] [,1] [,2] [1,]3 11 [2,]4 12 [[3]] [,1] [,2] [1,]5 13 [2,]6 14 [[4]] [,1] [,2] [1,]7 15 [2,]8 16 [[5]] [,1] [,2] [1,] 17 25 [2,] 18 26 ... Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nice report generator?
Hi guys, AFAICT, this conversation now discusses some domain-specific issues. However, in reply to OP and OQ, I'll shamelessly advertise the package that Gergely Daróczi and I have released recently. It's named rapport, and you can grab it from CRAN or GitHub (https://github.com/aL3xa/rapport/). Check it out, as its purpose is to deliver reports in bosses-friendly format. It relies on pandoc, so you can easily convert it to external formats like ODT, DOCX, PDF, LaTeX, etc. Check out the project homepage for more details: http://rapport-package.info/ Cheers, aL3xa On Tue, Feb 7, 2012 at 16:45, Martin Studer martin.remo.stu...@gmail.comwrote: Hi Michael, Hi all, an alternative for reading/writing tables from/to Excel is the package XLConnect. It is platform-independent and therefore runs under Windows, UNIX/Linux Mac. It supports reading/writing worksheets as well as named ranges. In addition, for reporting purposes, there is support for cell styles. You can find an example here: http://miraisolutions.wordpress.com/2011/08/31/xlconnect-a-platform-independent-interface-to-excel/ XLConnect A platform-independent interface to Excel . With respect to cell styles: you either completely code up the styling according to your needs using /setCellStyle/ (which can be cumbersome if you need to apply a lot of styling) or you can use so-called style actions controlled via /setStyleAction/. See the XLConnect reference manual for more information. Best regards, Martin -- View this message in context: http://r.789695.n4.nabble.com/nice-report-generator-tp4169939p4365144.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split matrix into square submatices
On Fri, Feb 10, 2012 at 3:02 AM, xiddw xid...@gmail.com wrote: Hi everybody, I'm looking for an optimal way to split a big matrix (e.g. ncol = 8, nrow=8) into small square submatrices (e.g. ncol=2, nrow=2) For example If I have h [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1 9 17 25 33 41 49 57 [2,] 2 10 18 26 34 42 50 58 [3,] 3 11 19 27 35 43 51 59 [4,] 4 12 20 28 36 44 52 60 [5,] 5 13 21 29 37 45 53 61 [6,] 6 14 22 30 38 46 54 62 [7,] 7 15 23 31 39 47 55 63 [8,] 8 16 24 32 40 48 56 64 and I want to split matriz h into 16 small submatrices: g[1] [,1] [,2] [1,] 1 9 [2,] 2 10 g[2] [,1] [,2] [1,] 17 25 [2,] 18 26 ... g[4] [,1] [,2] [1,] 49 57 [2,] 50 58 ... g[16] [,1] [,2] [1,] 55 63 [2,] 56 64 Always the big matrix would be a square matrix and also would have a nrow and ncol in order of a power of two, so it always could be splitted in at least halves. Until now, I'm able to split a matrix but keeping the number of original matrix columns. I mean, if I want to split into 16 submatrices, using the following command what I get its 4 lists: But I haven't found a way to split resultant lists in order to get 4 matrices for each list. g - split(as.data.frame(h), (1:4)) g $`1` V1 V2 V3 V4 V5 V6 V7 V8 1 1 9 17 25 33 41 49 57 5 5 13 21 29 37 45 53 61 $`2` V1 V2 V3 V4 V5 V6 V7 V8 2 2 10 18 26 34 42 50 58 6 6 14 22 30 38 46 54 62 $`3` V1 V2 V3 V4 V5 V6 V7 V8 3 3 11 19 27 35 43 51 59 7 7 15 23 31 39 47 55 63 $`4` V1 V2 V3 V4 V5 V6 V7 V8 4 4 12 20 28 36 44 52 60 8 8 16 24 32 40 48 56 64 Try this: h - matrix(1:64, 8) # input k - kronecker(matrix(1:16, 4, byrow = TRUE), matrix(1, 2, 2)) g - lapply(split(h, k), matrix, nr = 2) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nice report generator?
To be strictly correct, shouldn't that be: formula- eval(substitute( value*v*LEFT ~ RIGHT, list(LEFT=LEFT, RIGHT=RIGHT))) ? I think it probably doesn't matter. The difference is that mine gives a pure language object, whereas yours gives a formula object. The formula object has a class which means some methods will work differently, and it also has an environment attached, which defines where the variables in it should be resolved. I suspect the variables shouldn't be resolved in the environment where formula was being created, so it's probably better not to attach an environment at all, but the tabular function ignores the environment of the formula (it uses its data argument for that), so it doesn't make a big difference. Yes, sorry, I was probably being excessively picky :/ Formula semantics are tricky though! Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trust in a glm.nb model results with an itereation limit reached
On Feb 10, 2012, at 7:14 AM, Lucas wrote: Hello to everyone. I'm fitting a glm.nb model to a count data. I'm using about 8 predictive variables. Once a run the script I do get a result but it tells me that the iteration limit has been reached. Most regression functions have a control parameter that can increase the maximum number of iterations, but you provided no code so all is guesswork from here. So, can i trust the results given by the model? Probably not. Some functions will not even record estimates if they have failed the convergence tests. Could it be a multicollinearity problem? Could be. Thank you for taking the time to help me. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Q - scatterplots
Dear J., -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Jhope Sent: February-10-12 3:44 AM To: r-help@r-project.org Subject: [R] Q - scatterplots I was able to make a scatterplot but ... 1) what does the 86 mean? The 86 shows up on the graph as well. scatterplot (Shells/TotalEggs ~ Sector, data = data.to.analyze) [1] 86 You don't provide enough information to answer your questions. I assume, for example, that this is the scatterplot() function in the car package. If so, the function returns the names (in your case, numbers) of outlying observations, which are also labelled on the graph, but that shouldn't have happened unless you set the id.n or id.method arguments appropriately. Beyond the command, you haven't provided any information about the version of the car package that you're using, or even whether that's what you're using. 2) Also how do you change the Y axis title? I don't want it to read Shells/TotalEggs, instead I would like it to read Average Hatching Rate (%). See ?scatterplot, in particular, the ylab argument, assuming again that you're using scatterplot() in the car package. 3) What does this error mean? Rayos if composed of section 1, 2, 3, 4 and 5 scatterplot (Shells/TotalEggs ~ Rayos, data = data.to.analyze) Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf I suspect that Rayos isn't numeric but without the data it's impossible to tell. Please see the posting guide for r-help http://www.R-project.org/posting-guide.html for how to formulate an answerable question. Best, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox J -- View this message in context: http://r.789695.n4.nabble.com/Q- scatterplots-tp4375632p4375632.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] e1071/svm: never finishes
* Sam Steingold f...@tah.bet [2012-02-10 10:01:54 -0500]: When I tried to run svm on the same data frame, memory usage as reported by top(1) doubled to 4GB almost right away and the function never returned (has been running for ~15 hours now). ^C does not stop it. This is most unusual, libsvm has always seemed very fast. looks like it _is_ libsvm: #0 0x72aedc64 in Solver::select_working_set (this=0x7fff97f0, out_i=@0x7fff95a0, out_j=@0x7fff95b0) at svm.cpp:852 #1 0x72aef91d in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., p_=optimized out, y_=optimized out, alpha_=0x6023fb60, Cp=1, Cn=1, eps=optimized out, si=0x7fff9980, shrinking=1) at svm.cpp:573 #2 0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, alpha=0x6023fb60, param=optimized out, prob=0x7fff9c30) at svm.cpp:1444 #3 svm_train_one (prob=0x7fff9c30, param=optimized out, Cp=1, Cn=1) at svm.cpp:1641 #4 0x72af4a8e in svm_train (prob=optimized out, param=0x7fff9d40) at svm.cpp:2179 #5 0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, c=optimized out, y=optimized out, rowindex=optimized out, colindex=optimized out, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, coefs=0x391dbb48, sigma=0x10358a88, probA=0xdf94678, probB=0xcbb7eb8, cresults=0x0, ctotal1=0x10358ac0, ctotal2=0x10358af8, error=0x10358b30) at Rsvm.c:275 #6 0x7792cefc in ?? () from /usr/lib/R/lib/libR.so #7 0x7795da1d in Rf_eval () from /usr/lib/R/lib/libR.so #8 0x7795f540 in ?? () from /usr/lib/R/lib/libR.so #9 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #10 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so #11 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #12 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #13 0x779ad784 in Rf_usemethod () from /usr/lib/R/lib/libR.so #14 0x779ada47 in ?? () from /usr/lib/R/lib/libR.so #15 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #16 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #17 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so #18 0x7795f540 in ?? () from /usr/lib/R/lib/libR.so #19 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #20 0x7795db9b in ?? () from /usr/lib/R/lib/libR.so #21 0x7795dad9 in Rf_eval () from /usr/lib/R/lib/libR.so #22 0x7795f6c9 in ?? () from /usr/lib/R/lib/libR.so #23 0x7795d7ff in Rf_eval () from /usr/lib/R/lib/libR.so #24 0x77960a7f in Rf_applyClosure () from /usr/lib/R/lib/libR.so #25 0x7795d6e0 in Rf_eval () from /usr/lib/R/lib/libR.so #26 0x77998055 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so #27 0x779982e0 in ?? () from /usr/lib/R/lib/libR.so #28 0x77998370 in run_Rmainloop () from /usr/lib/R/lib/libR.so #29 0x0040078b in main () #30 0x772d930d in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #31 0x004007bd in _start () #0 0x72aeeb67 in Kernel::dot (px=0x48eeb220, py=0x4b21890) at svm.cpp:295 #1 0x72af7a25 in Kernel::kernel_rbf (this=optimized out, i=optimized out, j=optimized out) at svm.cpp:239 #2 0x72af782c in SVC_Q::get_Q (this=0x7fff9870, i=187701, len=208039) at svm.cpp:1271 #3 0x72aef9ab in Solver::Solve (this=0x7fff97f0, l=285724, Q=..., p_=optimized out, y_=optimized out, alpha_=0x6023fb60, Cp=1, Cn=1, eps=optimized out, si=0x7fff9980, shrinking=1) at svm.cpp:591 #4 0x72af1747 in solve_c_svc (Cn=1, Cp=1, si=0x7fff9980, alpha=0x6023fb60, param=optimized out, prob=0x7fff9c30) at svm.cpp:1444 #5 svm_train_one (prob=0x7fff9c30, param=optimized out, Cp=1, Cn=1) at svm.cpp:1641 #6 0x72af4a8e in svm_train (prob=optimized out, param=0x7fff9d40) at svm.cpp:2179 #7 0x72aea281 in svmtrain (x=0x7fff7e698038, r=0x11c9b1e0, c=optimized out, y=optimized out, rowindex=optimized out, colindex=optimized out, svm_type=0x11c9b2a0, kernel_type=0x11c9b2d0, degree=0x11c9b300, gamma=0x356e3a28, coef0=0x356e3a60, cost=0x356e3ad0, nu=0x103589a8, weightlabels=0x0, weights=0x0, nweights=0x11c9b330, cache=0x103589e0, tolerance=0x10358a18, epsilon=0x10358a50, shrinking=0x11c9b360, cross=0x11c9b390, sparse=0x11c9b3c0, probability=0x1524dbb0, seed=0x1524dbe0, nclasses=0x1524dc10, nr=0x1524dc40, index=0x148a0fa8, labels=0xa3303b8, nSV=0xa330420, rho=0x170083e8, coefs=0x391dbb48, sigma=0x10358a88,
Re: [R] Importing a CSV file
First, make sure the file name is correct. Second, try list.files(Users:/sezginozcan/Downloads/) to see if the file is listed and you are looking in the correct folder. Third, you may be able to locate and open the file using A - read.table(file.choose(), sep=\t, row.names=1, header=T) I believe that clipboard works only on Windows computers. -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of sezgin ozcan Sent: Thursday, February 09, 2012 9:21 PM To: r-h...@stat.math.ethz.ch Subject: [R] Importing a CSV file I have been trying to import a csv file to r. but I get the same message everytime. the message is Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such file or directory I use mac. I tried this command also a-read.table(clipboard,sep=\t,row.names=1,header=T) Error: unexpected input in a-read.table(clipboard,sep=, I will appreciate if you help me before I get crazy. thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Importing a CSV file
-- Forwarded message -- From: Gyanendra Pokharel gyanendra.pokha...@gmail.com Date: Fri, Feb 10, 2012 at 11:13 AM Subject: Re: [R] Importing a CSV file To: sezgin ozcan szgoz...@gmail.com You save your file in your own folder either in desktop or in document where ever you want and serach the file using the command mydatafile - read.csv(file.choose()) then open the file Cheers On Thu, Feb 9, 2012 at 10:20 PM, sezgin ozcan szgoz...@gmail.com wrote: I have been trying to import a csv file to r. but I get the same message everytime. the message is Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such file or directory I use mac. I tried this command also a-read.table(clipboard,sep=\t,row.names=1,header=T) Error: unexpected input in a-read.table(clipboard,sep= I will appreciate if you help me before I get crazy. thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to have columns lined up?
A little more on topic is that there are tools for R to nicely format output. The simplest is probably the R2wd package which will transfer objects and text to word, you can send a matrix or data frame to word as a nicely formatted table or wdVerbatim will send output to word but make sure that the font is the correct type to make things line up. If you are going to be doing a lot of this then you should investigate the variations on Sweave (Sword, knitr, odfWeave, etc.) which allow you to create a template file with R code and report text, then process it to replace the R code with its output. On Tue, Feb 7, 2012 at 9:07 PM, hithit168 ccchri...@live.com wrote: Hi there, Everytime when I paste My R output in WORD, the columns couldn't line up nicely like they appear in R console. Is there a way to fix this problem? Thanks for any help! time n.risk n.event survival std.err lower 95% CI upper 95% CI 6 21 3 0.857 0.0764 0.707 1.000 7 17 1 0.807 0.0869 0.636 0.977 10 15 1 0.753 0.0963 0.564 0.942 13 12 1 0.690 0.1068 0.481 0.900 16 11 1 0.627 0.1141 0.404 0.851 22 7 1 0.538 0.1282 0.286 0.789 23 6 1 0.448 0.1346 0.184 0.712 -- View this message in context: http://r.789695.n4.nabble.com/How-to-have-columns-lined-up-tp4367928p4367928.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem subsetting data frame with variable instead of constant
Hello, I've encountered a very weird issue with the method subset(), or maybe this is something I don't know about said method that when you're subsetting based on the columns of a data frame you can only use constants (0.1, 2.3, 2.2) instead of variables? Here's a look at my data frame called 'ea.cad.pwr': *ea.ca.pwr[1:5,] MAF OR POWER 1 0.02 0.01 0. 2 0.02 0.02 0.9998 3 0.02 0.03 0.9997 4 0.02 0.04 0.9995 5 0.02 0.05 0.9993* Here's my subset lines which finds no rows: *power1 = subset(ea.cad.pwr, MAF == maf1 OR == odds) power2 = subset(ea.cad.pwr, MAF == maf2 OR == odds) * Now when maf1 = 0.2 and odds = 1.2 it finds nothing. I know for a fact that there's a row with these values: * ea.cad.pwr[1430:1432,] MAF OR POWER 1430 0.2 0.58 0.9996 1431 0.2 1.20 0.3092 1432 0.2 1.22 0.3914* I have code working in a loop and each previous iteration the subset() function is working fine, but in this iteration some different lines are executed which are relevant to these variables, here they are: * maf1 = maf.adj - 0.01 maf2 = maf.adj + 0.01* Basically maf.adj is always a 2 decimal number (in this case = 0.21), and I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in case maf.adj isn't in the table. maf.adj is read from another dataframe, when I use it to subset it always works fine but when I do this innocent subtraction for some reason it doesn't work. If I rewrite statements like this it works: *power1 = subset(ea.cad.pwr, MAF == 0.2 OR == odds) power2 = subset(ea.cad.pwr, MAF == 0.22 OR == odds) * Even if I write this first: maf1 = 0.2 Then: power1 = subset(ea.cad.pwr, MAF == maf1 OR == odds) It works as well! That's what's really confusing, how can this subtraction mess everything up? Please help if you can..thank you! Vaneet -- View this message in context: http://r.789695.n4.nabble.com/problem-subsetting-data-frame-with-variable-instead-of-constant-tp4376759p4376759.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem subsetting data frame with variable instead of constant
On Fri, Feb 10, 2012 at 08:15:39AM -0800, vaneet wrote: Hello, I've encountered a very weird issue with the method subset(), or maybe this is something I don't know about said method that when you're subsetting based on the columns of a data frame you can only use constants (0.1, 2.3, 2.2) instead of variables? Here's a look at my data frame called 'ea.cad.pwr': *ea.ca.pwr[1:5,] MAF OR POWER 1 0.02 0.01 0. 2 0.02 0.02 0.9998 3 0.02 0.03 0.9997 4 0.02 0.04 0.9995 5 0.02 0.05 0.9993* Here's my subset lines which finds no rows: *power1 = subset(ea.cad.pwr, MAF == maf1 OR == odds) power2 = subset(ea.cad.pwr, MAF == maf2 OR == odds) * Now when maf1 = 0.2 and odds = 1.2 it finds nothing. I know for a fact that there's a row with these values: * ea.cad.pwr[1430:1432,] MAF OR POWER 1430 0.2 0.58 0.9996 1431 0.2 1.20 0.3092 1432 0.2 1.22 0.3914* I have code working in a loop and each previous iteration the subset() function is working fine, but in this iteration some different lines are executed which are relevant to these variables, here they are: * maf1 = maf.adj - 0.01 maf2 = maf.adj + 0.01* Basically maf.adj is always a 2 decimal number (in this case = 0.21), and I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in case maf.adj isn't in the table. maf.adj is read from another dataframe, when I use it to subset it always works fine but when I do this innocent subtraction for some reason it doesn't work. If I rewrite statements like this it works: *power1 = subset(ea.cad.pwr, MAF == 0.2 OR == odds) power2 = subset(ea.cad.pwr, MAF == 0.22 OR == odds) * Even if I write this first: maf1 = 0.2 Then: power1 = subset(ea.cad.pwr, MAF == maf1 OR == odds) It works as well! That's what's really confusing, how can this subtraction mess everything up? Please help if you can..thank you! Hi. This may be a rounding problem. Try 0.3 - 0.1 == 0.2 [1] FALSE Explicit rounding to a not too large number of decimal digits can help. round(0.3 - 0.1, digits=7) == 0.2 [1] TRUE See also FAQ 7.31 or http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem subsetting data frame with variable instead of constant
This is likely a representation issue, as in R FAQ 7.31. ?== suggests that using identical and all.equal is a better strategy: x1 - 0.5 - 0.3 x2 - 0.3 - 0.1 x1 == x2 # FALSE on most machines identical(all.equal(x1, x2), TRUE) # TRUE everywhere Sarah On Fri, Feb 10, 2012 at 11:15 AM, vaneet vaneet.lo...@mountsinai.org wrote: Hello, I've encountered a very weird issue with the method subset(), or maybe this is something I don't know about said method that when you're subsetting based on the columns of a data frame you can only use constants (0.1, 2.3, 2.2) instead of variables? Here's a look at my data frame called 'ea.cad.pwr': *ea.ca.pwr[1:5,] MAF OR POWER 1 0.02 0.01 0. 2 0.02 0.02 0.9998 3 0.02 0.03 0.9997 4 0.02 0.04 0.9995 5 0.02 0.05 0.9993* Here's my subset lines which finds no rows: *power1 = subset(ea.cad.pwr, MAF == maf1 OR == odds) power2 = subset(ea.cad.pwr, MAF == maf2 OR == odds) * Now when maf1 = 0.2 and odds = 1.2 it finds nothing. I know for a fact that there's a row with these values: * ea.cad.pwr[1430:1432,] MAF OR POWER 1430 0.2 0.58 0.9996 1431 0.2 1.20 0.3092 1432 0.2 1.22 0.3914* I have code working in a loop and each previous iteration the subset() function is working fine, but in this iteration some different lines are executed which are relevant to these variables, here they are: * maf1 = maf.adj - 0.01 maf2 = maf.adj + 0.01* Basically maf.adj is always a 2 decimal number (in this case = 0.21), and I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in case maf.adj isn't in the table. maf.adj is read from another dataframe, when I use it to subset it always works fine but when I do this innocent subtraction for some reason it doesn't work. If I rewrite statements like this it works: *power1 = subset(ea.cad.pwr, MAF == 0.2 OR == odds) power2 = subset(ea.cad.pwr, MAF == 0.22 OR == odds) * Even if I write this first: maf1 = 0.2 Then: power1 = subset(ea.cad.pwr, MAF == maf1 OR == odds) It works as well! That's what's really confusing, how can this subtraction mess everything up? Please help if you can..thank you! Vaneet -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stop a loop for?
If you want to pause for the person to look at a plot before going on to the next plot then just do: par(ask=TRUE) This will actually allow your loop to continue with calculations while the user looks at the plot but will pause before drawing the next plot (hitting enter in the command window or clicking on the plot window will allow the code to continue). If you want to pause for something other than a plot then use readline like Michael suggests. On Wed, Feb 8, 2012 at 12:45 PM, Juan Andres Hernandez jhernandezcabr...@gmail.com wrote: Hi all, I have some time trying to find a way to stop a loop for( ) until the user presses the enter key or any other one and the loop can continue. This could be an example: library(MASS) data - data.frame(mvrnorm(1000,rep(0,5),Sigma=diag(1,5))) for(i in 1:dim(data)[2]){ plot(density(data[,i]), main=paste('histogram',i)) #here something like waituntil command } Thank's in advance Juan A. Hernandez Spain. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem subsetting data frame with variable instead of constant
Thanks guys, both those solutions work. I really appreciate the help! -- View this message in context: http://r.789695.n4.nabble.com/problem-subsetting-data-frame-with-variable-instead-of-constant-tp4376759p4376826.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How indices calculated in package boot
In your second example the boot function will still generate its random indicies before your internal function calls sample, so the seed will be different when you call sample from what you originally set it to. If you really want to know what the boot function does, look at its code (it does some things to try and be more efficient which might give different results than a simpler version). On Wed, Feb 8, 2012 at 1:45 PM, Deng Nina nnd...@gmail.com wrote: Hi,there, I am using R package boot to bootstrap. I have one question here: does anybody possibly know how the boot package generates the indices which is used in the statistic function? I thought indices = sample(data, replace=TRUE), but when I replaced indices with this command and used boot, I got different results. Specifically, below are the codes for illustration. (1) The typical way by generating indices in the package: boot1 - function (data, indices) { d - data[indices] return(d) } AA - c(1:10) require(boot) set.seed(123) results1 - boot(data= AA, statistic=boot1, R=100) (2) The alternative way by calculating indices myself: boot2 - function (data,indices) { indices - sample(data, replace=TRUE) d - data[indices] return(d) } AA - c(1:10) set.seed(123) results2 - boot(data= AA, statistic=boot2, R=100) When I looked up using results1$t and results2$t, I had totoally different bootstrap samples. I found this even had great impacts on the results in my study. Does the second approach have any problem? Anyone could provide any inputs on this? Thank you very much in advance! Regards Nina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function arrows.circular not working
I have started using the circular package but it is not recognizing the function arrows.circular. I attempted to use the example provided in the circular manual. Here is the example code using the circular package: plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) My error is: Error: could not find function arrows.circular Any help would be greatly appreciated. Thanks. David Chosid Conservation Engineering MA Division of Marine Fisheries 1213 Purchase St., 3rd Floor New Bedford, MA 02740 508-990-2860 david.cho...@state.ma.usmailto:david.cho...@state.ma.us [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function arrows.circular not working
Your sessionInfo() would be helpful. Also, just to check: you did do library(circular) right? Sarah On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) david.cho...@state.ma.us wrote: I have started using the circular package but it is not recognizing the function arrows.circular. I attempted to use the example provided in the circular manual. Here is the example code using the circular package: plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) My error is: Error: could not find function arrows.circular Any help would be greatly appreciated. Thanks. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function arrows.circular not working
Hi David, You need to load the package before running your code: # install.packages('circular') require(circular) plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) HTH, Jorge.- On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) wrote: I have started using the circular package but it is not recognizing the function arrows.circular. I attempted to use the example provided in the circular manual. Here is the example code using the circular package: plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) My error is: Error: could not find function arrows.circular Any help would be greatly appreciated. Thanks. David Chosid Conservation Engineering MA Division of Marine Fisheries 1213 Purchase St., 3rd Floor New Bedford, MA 02740 508-990-2860 david.cho...@state.ma.usmailto:david.cho...@state.ma.us [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help needed please
Your script is rather inefficient with spurious cbind calls. Any particular reason not to use ?ar directly ? Call: ar.yw.default(x = simtimeseries, order.max = 4) Coefficients: 1234 1.9440 -1.9529 0.8450 -0.2154 Order selected 4 sigma^2 estimated as 15.29 To repeat the sim, you could use a for() loop but ?sapply is better: out- sapply(1:100,function(...){ simtimeseries - arima.sim(n=1024,list(order=c(4,0,0), ar=c(2.7607, -3.8106, 2.6535, -0.9258),sd=sqrt(1))) aryule - ar.yw(simtimeseries,order.max=4) c( c(aryule$ar,NA)[1:4] , aryule$var.pred ) } ) rowMeans(out[1:4,]) # mean phi(1),...,4 see ?rowMeans for dealing with NA's mean(out[5,])# mean sig^2 Cheers On Fri, Feb 10, 2012 at 6:42 AM, Jaymin Shah jayminsh...@hotmail.com wrote: I have coded a time series from simulated data: simtimeseries - arima.sim(n=1024,list(order=c(4,0,0),ar=c(2.7607, -3.8106, 2.6535, -0.9258),sd=sqrt(1))) #show roots are outside unit circle plot.ts(simtimeseries, xlab=, ylab=, main=Time Series of Simulated Data) # Yule q1 - cbind(simtimeseries[1:1024]) q2 - t(q1)%*%q1 s0 - q2/1204 r1 - cbind(simtimeseries[1:1023]) r2 - cbind(simtimeseries[2:1024]) r3 - t(r1)%*%r2 s1 - r3/1204 t1 - cbind(simtimeseries[1:1022]) t2 - cbind(simtimeseries[3:1024]) t3 - t(t1)%*%t2 s2 - t3/1204 u1 - cbind(simtimeseries[1:1021]) u2 - cbind(simtimeseries[4:1024]) u3 - t(u1)%*%u2 s3 - u3/1204 v1 - cbind(simtimeseries[1:1020]) v2 - cbind(simtimeseries[5:1024]) v3 - t(v1)%*%v2 s4 - v3/1204 i0 - c(s0,s1,s2,s3) i1 - c(s1,s0,s1,s2) i2 - c(s2,s1,s0,s1) i3 - c(s3,s2,s1,s0) gamma - cbind(i0,i1,i2,i3) eta -c(s1,s2,s3,s4) inversegamma - solve(gamma) phihat - inversegamma%*%eta phihat Phihat - cbind(phihat) s - c(s1,s2,s3,s4) S - cbind(s) sigmasquaredyule - s0 - (t(Phihat)%*%S) sigmasquaredyule I did a yule walker estimate on the simulated data and wanted to work out phi hat which is a vector of 4 values and sigmasquaredyule which is one value. However, I want to run the simulated data 100 times i.e. in a for loop and then take the averages of the phi hat values and sigmasquaredyule value. How would i repeat this simulated time series lots of times (e.g. a 100 times) and store the average value of phi hat and sigmasquaredyule. Thank you [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function arrows.circular not working
Yes, sorry to not be clear. library(circular) was run. From: Jorge I Velez [mailto:jorgeivanve...@gmail.com] Sent: Friday, February 10, 2012 12:03 PM To: Chosid, David (FWE) Cc: r-help@r-project.org Subject: Re: [R] function arrows.circular not working Hi David, You need to load the package before running your code: # install.packages('circular') require(circular) plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) HTH, Jorge.- On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) wrote: I have started using the circular package but it is not recognizing the function arrows.circular. I attempted to use the example provided in the circular manual. Here is the example code using the circular package: plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) My error is: Error: could not find function arrows.circular Any help would be greatly appreciated. Thanks. David Chosid Conservation Engineering MA Division of Marine Fisheries 1213 Purchase St., 3rd Floor New Bedford, MA 02740 508-990-2860tel:508-990-2860 david.cho...@state.ma.usmailto:david.cho...@state.ma.usmailto:david.cho...@state.ma.usmailto:david.cho...@state.ma.us [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Outlier removal techniques
-Original Message- I wonder why it is still standard practice in some circles to search for outliers as opposed to using robust/resistent methods. At the risk of extending an old debate and driving us off list topic, here are three possible reasons: i) Identifying outliers is important when you want to find possible mistakes in measurement or data entry - so irrespective of whether you use robust methods, you probably want to ask questions like 'why has that result been entered as almost exactly 1000 times the value I expected?' [typically a unit error, btw). And although graphical outlier checking is the obvious way to do that, eyeballs see oddity in chance; an outlier test can help you distinguish oddity from chance and save some (arguably) unnecessary follow-up. ii) because supervised outlier rejection at around the 99% level performs - for simple problems - about as well as Huber's with c set to 1.5 and is a lot easier to explain to, er, people who don't understand iterative numerical methods. iii) Because it's written into some international Standards for statistical processing of data (ie, it's standard practice because it's Standard practice). iv) because you can't do robust analysis in Excel* Not that all these are necesarily _good_ reasons ... ;-) However, I do NOT understand why schools in the UK teach physics students that outliers should automatically and always be thrown out; that's a much larger leap. *You can actually; with R or several add-ins. But that is off topic. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split dataframe into new dataframes
I think one of your problems (the others have been addressed by others) is that you want the expression x$y to represent a column of x whose name is stored in y (not the name y itself). The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad. It is better to use '[[' instead of the shortcut, try replacing x$y with x[[y]] and see if that fixes that problem. On Wed, Feb 8, 2012 at 2:11 PM, Johannes Radinger jradin...@gmx.at wrote: Hi, I want to split a dataframe based on a grouping variable (in one column). The resulting new dataframes should be stored in a new variable. I tried to split the dataframe using split() and to store it using a FOR loop, but thats not working so far: df - data.frame(A=c(A1,A1,A2,A2),B=seq(1:4)) Fsplit - function(x,y){ ls - split(x,f=x$y) for (i in names(ls)){ i - ls$i } } Fsplit(df,A) #1st argument is dataframe to split, 2nd argument grouping variable Any suggestions how to get that done? Best regards Johannes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function arrows.circular not working
Hi, On Fri, Feb 10, 2012 at 12:15 PM, Chosid, David (MISC) david.cho...@state.ma.us wrote: Yes, sorry for not being more clear. Here is the sessionInfo(): sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 That is definitely the place to start. Your version of R is 2.5 years old, and circular is up to version 0.4-3 on CRAN. I just checked, and circular 0.3-8 doesn't *have* arrows.circular. So you must be using some documentation from the internet for a newer version, rather than using the docs that go with what you have installed on your computer. You need to update R and your packages. Sarah locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid tcltk stats graphics grDevices utils datasets [8] methods base other attached packages: [1] relimp_1.0-1 Rcmdr_1.4-10 car_1.2-14 circular_0.3-8 [5] boot_1.2-37 lattice_0.17-25 RODBC_1.3-0 -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Friday, February 10, 2012 12:02 PM To: Chosid, David (FWE) Cc: r-help@r-project.org Subject: Re: [R] function arrows.circular not working Your sessionInfo() would be helpful. Also, just to check: you did do library(circular) right? Sarah On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) david.cho...@state.ma.us wrote: I have started using the circular package but it is not recognizing the function arrows.circular. I attempted to use the example provided in the circular manual. Here is the example code using the circular package: plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) My error is: Error: could not find function arrows.circular Any help would be greatly appreciated. Thanks. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] debug in a loop
Hi, I'd like to debug in a loop (using debug() and browser() etc but not print() ). I'am looking for the first occurence of NA. For instance: tab = c(1:300) tab[250] = NA len = length(tab) for (i in 1:len){ if(i != len){ tab[i] = tab[i]+tab[i+1] } } I do not want to do Browse[2] n for each step ... I'd like to declare a browser() in the loop with a condition. But how to write stop running when you encounter NA ? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formatting Y axis.
I've looked around and I just can't find anything that will work for my needs. This is a bit of a 2 part question but pertaining to the same topic so bare with me. The first is with my qq plot. On the Y axis of my qq plot it'll have my sample quantities but because my data is log-normal it'll show numbers between 0 - 5 (depending on the data). I'd like to know how to get it, instead of displaying 0-5 to display actual values so that way when reading it I know where my potential problems are. Would just save a lot of time and be an easier more understandable read. Here's sample code to plug in and see. x - rlnorm(400, 2.3, .8) y - rlnorm(400, 3, 1.6) plot.new() q1 - qqnorm(log1p(y)) q2 - qqnorm(log1p(x)) points( q1, col = red) points( q2, col = blue) abline( qqline(log1p(y), col = red) ) abline( qqline(log1p(x), col = blue) ) legend( bottomright, inset = 0.02, title = Loc_ID, c(Sayincom, Sayout), fill = c( red, blue), horiz = TRUE) So what I want is for the y axis. or even secondary y axis if it's lined up properly to read the actual values rather than the logged values. The second part of this is in regards to my boxplots. In order to fit all of the data in a readable manner I've been making log = y. Which is fine, but when trying to estimate the difference via notches it's difficult to estimate how far it really is based on the axis being logged. You can use the same x as above, or y, whichever floats your boat. boxplot ( x, notch = TRUE, log = y, boxwex = .50) Versus boxplot ( x, notch = TRUE, boxwex = .50) What I need from this is for the ability to zoom in on a particular location. Rather than be forced to look at things in a log format on my y axis to just be able to say, ok I have my mean at 25 and my standard deviation is 10 so I want to concentrate this boxplot between 5 and 45 to view 2 standard deviations. (That whole mean thing and standard deviations are arbitrary numbers but the concept remains) Any help would be much appreciated. If there are particular packages required for any suggestions please include their name in your response. If you need actual imaging of what the actual graphs I'm working with look like as opposed to the randomly generated values that I'm supplying then just say so and I'll post the actual graphs I'm trying to work with. Sincerely, Sock -- View this message in context: http://r.789695.n4.nabble.com/Formatting-Y-axis-tp4376843p4376843.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find exit status inside on.exit
Hello, Does anyone know if it is possible to find out whether a calling function exited with an error or not inside of an on.exit() call? I'd like to use the same error cleanup function inside of several functions, and for aesthetic reasons would prefer not to surround the body of all of these function with a tryCatch. Something like this: isError - function() { parentError - ??? if (!parentError) cat(No errors here!\n) } X - function() { on.exit(isError()) 1+1 } X() [1] 2 No errors here! Thanks, Robert Robert McGehee, CFA Geode Capital Management, LLC One Post Office Square, 28th Floor | Boston, MA | 02109 Direct: (617)392-8396 This e-mail, and any attachments hereto, are intended fo...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Split matrix into square submatices
Thanks to Peter and Gabriel for their comments, The first way which Peter commented was my first approach, however as I use it for a 'large' matrix (in my case nrow = ncol = 512) and I want to split it into very small matrices (e.g. nrow = ncol = {4, 8, 16}) both nested loops I think are a quite expensive and that's why I was looking for a better way. The Gabor's approach works perfect for my needs, thanks :) -- View this message in context: http://r.789695.n4.nabble.com/Split-matrix-into-square-submatices-tp4375563p4376952.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] debug in a loop
You can add if(is.na(tab[i])) browser() or if(is.na(tab[i])) break see inline On Fri, Feb 10, 2012 at 7:22 AM, ikuzar raz...@hotmail.fr wrote: Hi, I'd like to debug in a loop (using debug() and browser() etc but not print() ). I'am looking for the first occurence of NA. For instance: tab = c(1:300) tab[250] = NA len = length(tab) for (i in 1:len){ if(i != len){ if(is.na(tab[i])) browser() tab[i] = tab[i]+tab[i+1] } } I do not want to do Browse[2] n for each step ... I'd like to declare a browser() in the loop with a condition. But how to write stop running when you encounter NA ? Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function arrows.circular not working
It's a good idea to acknowledge that you've found a solution on the R-help list, rather than just to me. That way the answer appears in the list archives, and other people will know you no longer need help with this problem. Sarah On Fri, Feb 10, 2012 at 12:57 PM, Chosid, David (MISC) david.cho...@state.ma.us wrote: Ah. Thanks. I didn't realize that I needed an updated R... Just the library. Works now. -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Friday, February 10, 2012 12:32 PM To: Chosid, David (FWE); r-help Subject: Re: [R] function arrows.circular not working Hi, On Fri, Feb 10, 2012 at 12:15 PM, Chosid, David (MISC) david.cho...@state.ma.us wrote: Yes, sorry for not being more clear. Here is the sessionInfo(): sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 That is definitely the place to start. Your version of R is 2.5 years old, and circular is up to version 0.4-3 on CRAN. I just checked, and circular 0.3-8 doesn't *have* arrows.circular. So you must be using some documentation from the internet for a newer version, rather than using the docs that go with what you have installed on your computer. You need to update R and your packages. Sarah locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] grid tcltk stats graphics grDevices utils datasets [8] methods base other attached packages: [1] relimp_1.0-1 Rcmdr_1.4-10 car_1.2-14 circular_0.3-8 [5] boot_1.2-37 lattice_0.17-25 RODBC_1.3-0 -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Friday, February 10, 2012 12:02 PM To: Chosid, David (FWE) Cc: r-help@r-project.org Subject: Re: [R] function arrows.circular not working Your sessionInfo() would be helpful. Also, just to check: you did do library(circular) right? Sarah On Fri, Feb 10, 2012 at 11:55 AM, Chosid, David (MISC) david.cho...@state.ma.us wrote: I have started using the circular package but it is not recognizing the function arrows.circular. I attempted to use the example provided in the circular manual. Here is the example code using the circular package: plot(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1)) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), col=2) arrows.circular(rvonmises(10, circular(0), kappa=1), y=runif(10), x0=runif(10, -1, 1), y0=runif(10, -1, 1), col=3) My error is: Error: could not find function arrows.circular Any help would be greatly appreciated. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a) t-tests on loess splines; b) linear models, type II SS for unbalanced ANOVA
Dear all, I have some questions regarding the validity an implementation of statistical tests based on linear models and loess. I've searched the R-help arhives and found several informative threads that related to my questions, but there are still a few issues I'm not clear about. I'd be grateful for guidance. Background and data set: I wish to compare the growth and metabolism of 2 strains of bacteria. I have grown the 2 strains in 3 replicates on 2 different occasions and on each occasion taken hourly measurements of the population density of bacteria (represented by od (optical density)) and the concentrations of several chemicals, such as glycerol, in their growth medium (environment). The data looks like this: head(df.yx.t) uid cid date g tod glycerolalanine 66 Ma,1 25 Ma 1 0.1428571 3025750049 7843841013 77 Ma,1 25 Ma 2 0.4542857 3026668036 7902016686 88 Ma,1 25 Ma 3 1.4542857 3086406597 8017589237 99 Ma,1 25 Ma 4 2.587 2821935918 6595302338 10 10 Ma,1 25 Ma 5 3.093 2674142017 6144141491 11 11 Ma,1 25 Ma 6 3.600 3026824144 7009788499 summary(df.yx.t) uid cid dategt odglycerolalanine 6 : 1 Ma,1 :10 25:60 Ma :60 1 :12 Min. :0.1171 Min. :7.305e+08 Min. :9.858e+07 7 : 1 Ma,2 :10 26:60 RILI2:60 2 :12 1st Qu.:1.3921 1st Qu.:1.757e+09 1st Qu.:3.199e+09 8 : 1 Ma,3 :10 3 :12 Median :3.8042 Median :2.541e+09 Median :5.664e+09 9 : 1 Ma,4 :10 4 :12 Mean :3.7822 Mean :2.354e+09 Mean :5.264e+09 10 : 1 Ma,5 :10 5 :12 3rd Qu.:5.9495 3rd Qu.:3.026e+09 3rd Qu.:7.699e+09 11 : 1 Ma,6 :10 6 :12 Max. :8.7375 Max. :3.296e+09 Max. :8.501e+09 (Other):114 (Other):60 (Other):48 uid = unique id for an observation cid = culture id (a within subjects factor identifying each bacterial culture) date = date of culture; data was collected in 2 batches, with a balanced split of observations (3 replicates of each genotype in each batch) g = genotype (the name of the bactrial strain) t = time (1-10 hours) od = optical density (a proxy for poulation density of the bacterial) glycerol = glycerol concentration alanine = alanine concentration I have tested whether Genotype affects the growth (od) of the bacteria using lm() and anova(). #define the model m1-lm(log10(od)~date*g*t,data=df.yx.t) #perform anova anova(m1) Analysis of Variance Table Response: log10(od) Df Sum Sq Mean Sq F valuePr(F) date 1 0.006 0.00565.3763 0.02297 * g 1 0.107 0.1066 103.2951 4.646e-16 *** t 9 33.646 3.7385 3620.9069 2.2e-16 *** date:g 1 0.003 0.00302.8731 0.09396 . date:t 9 0.015 0.00171.6542 0.11417 g:t9 0.021 0.00232.2329 0.02796 * date:g:t 9 0.005 0.00050.4950 0.87383 Residuals 80 0.083 0.0010 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 So far so good. (Although I do wonder whether I should be using a within subjects predictor variable (e.g. via car::Anova).) However, I also want to test whether genotype affects the concentrations of metabolites, e.g. glycerol, in the growth medium at a given population density. I envisage testing this by one of 2 methods, neither of which I am confident is correct, despite digging aound in various fora and statistics texts. Method 1. Loess splines I could fit a loess curve to glycerol~od, (the date effect disappears when plotting against od rather than t). m1-loess(glycerol~od, data=df.yx.t[df.yx.t$g==Ma,]) m2-loess(glycerol~od, data=df.yx.t[df.yx.t$g==RILI2,]) However, I'm not confident that I can perform valid statistic tests based on this. I could try to test the whether there is a significant difference in glycerol concentration at any given od as follows: #test for a difference at od=7 v.pred-7 pred1-predict(m1,newdata=data.frame(od=v.pred),se=T) pred2-predict(m2,newdata=data.frame(od=v.pred),se=T) dfit-data.frame(g=c(Ma,RILI2),fit=c(pred1$fit,pred2$fit),se=c(pred1$fit,pred2$fit)) alpha-0.05 dfit$lo-with(dfit,fit-se*abs(qnorm(alpha/2))) dfit$hi-with(dfit,fit+se*abs(qnorm(alpha/2))) Is this legitimate? I might also try to test whether the curves are significantly different using anova(m1,m2), though again I'm not sure of the validity of this approach since the 2 models are based on different (and different numbers of) observations. Method 2. Linear models
[R] Schwefel Function Optimization
All, I am looking for an optimization library that does well on something as chaotic as the Schwefel function: schwefel - function(x) sum(-x * sin(sqrt(abs(x With these guys, not much luck: optim(c(1,1), schwefel)$value [1] -7.890603 optim(c(1,1), schwefel, method=SANN, control=list(maxit=1))$value [1] -28.02825 optim(c(1,1), schwefel, lower=c(-500,-500), upper=c(500,500), method=L-BFGS-B)$value [1] -7.890603 optim(c(1,1), schwefel, method=BFGS)$value [1] -7.890603 optim(c(1,1), schwefel, method=CG)$value [1] -7.890603 All trapped in local minima. I get the right answer when I pick a starting point that's close: optim(c(400,400), schwefel, lower=c(-500,-500), upper=c(500,500), method=L-BFGS-B)$value [1] -837.9658 Of course I can always roll my own: r - vector() for(i in 1:1000) { x - runif(2, -500,500) m - optim(x, schwefel, lower=c(-500,-500), upper=c(500,500), method=L-BFGS-B) r - rbind(r, c(m$par, m$value)) } And this does fine. I'm just wondering if this is the right approach, or if there is some other package that wraps this kind of multi-start up so that the user doesn't have to think about it. Best, Ara __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] range and anisotropy with RandomFields
Hello, I am presently trying to get a feel for the various packages out there that allow me to both analyze and simulate random fields. The package RandomFields is nice, but there are still a few aspects of its implementation that are confusing to me and I was hoping someone could help clarify things for me. It could also be that my questions reflect a lack of knowledge pertaining to random fields in general and not the RandomFields package specifically, in which case someone can set me straight. I like the versatility of the extended definition form that can be used in RandomFields. For example, specifying an isotropic exponential model with a nugget of 1 as: modelA=list(+, list($, var=2, scale=2, list(exp)), list($, var=1, list(nugget)) ) Ultimately I would like to work with anisotropic models (and data), and this is where I get confused. I can specify a model with (geographic) anisotropy by defining an anisotropy matrix, m and incorporating that into the definition: m = matrix( c(1,0,0,0.1), nrow=2) modelA=list(+, list($, var=2, anis=m, list(exp)), list($, var=1, list(nugget)) ) First, nowhere in the RandomFields documentation does it say how the anisotropy matrix is actually defined, so I am assuming it is the usual A = [ 1 0 ] [ cos(phi) -sin(phi)] 0 R sin(phi)cos(phi) Where R is the ratio of the major range/minor range, and phi is the angle of rotation. Can anyone confirm this? Also, there now seems to be no way to explicitly specify the range of the major axis (it states this in the documentation and an error will be returned if I do try to set e.g. scale=2 in the definition) . Thus when I use e.g. GaussRF() to simulate a field using the above anisotropic model definition I have no idea how the range (scale) is being determined. Does anyone now how RandomFields works in this context? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/range-and-anisotropy-with-RandomFields-tp4377104p4377104.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Discrete Event Simulation problem
Hi All, I am attempting to simulation an inventory model on R and I am having some problems. I believe the problem is when I define my demand rate is stays constant throughout so when I do need to reorder the model does not recognise it as I have the initial supply arrival time set to infinity at the start as no order has been played but I have an if statement saying, if the level falls below a certain point then an order should be placed, I have the expected time to order arrival set to be before the the sale but this is not working. Any Help would be appreciated. Thanks maxStock = 100 minStock = 20 t.max=1100.0 LAST = t.max START = 0 GetDemand-function() START + runif(1,min=0,max=2) main - function(t.max,maxStock,minStock) { index = 0 t.current = START Starting Conditions t.demand = START t.supply = START inventory = 50 order_costs = 0 storage_costs = 0 sales = 0 sum = list(inventory = 50,order_costs = 0,storage_costs = 0,sales =0) while(index LAST){ index = index + 1 t.demand = GetDemand() ### expected time to next sale t.supply = Inf ### expected time to arrival of order, Infinity as order has not been placed t.next =min(t.demand,t.supply)###next event either sale or supply is the one with imminent arrival k = maxStock - inventory if(inventory minStock) { t.supply = Inf ### expected time to arrival of order, Infinity as order has not been placed } if(inventory 0) storage_costs = (t.next-t.current)*0.10*inventory t.current = t.next if (inventory minStock) {###Need to Order k = maxStock - inventory order_costs = 50 + 0.02*k sum$order_costs = sum$order_costs + order_costs t.supply = t.current + 1.0 } if(t.current ==t.demand) { sum$inventory = sum$inventory - 1.0 Sale made sum$sales = sum$sales + 1.0 t.demand = runif(1,min=0,max=2) } if(t.current == t.supply) { Order Arrives sum$inventory = sum$inventory + k k = 0 t.supply = Inf } if(inventory maxStock) { k = maxStock - inventory sum$storage_costs = sum$storage_costs + storage_costs sum$order_costs = sum$order_costs + order_costs sum$inventory = sum$invneotry + k - sum$sales sum$sales = sum$sales + sales sum$sales = sum$sales + sales } } sis = list(Time = index,StorageCosts =sum$storage_costs,OrderCosts= sum$order_costs, FinalInventoryLevel=sum$inventory,sales=sum$sales , AverageCosts =((sum$order_costs + sum$storage_costs)/index)) return(sis) } main() -- View this message in context: http://r.789695.n4.nabble.com/Discrete-Event-Simulation-problem-tp4377276p4377276.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Importing a CSV file
I think you must have given the path to the file wrong. Assuming you are using R.app, try infile - file.choose() And then give infile to the read.csv() function. Then print the value of infile to find out what the path really is. Probably, you wanted '/Users/sezginozcan/Downloads/beer.data.csv' -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 2/9/12 7:20 PM, sezgin ozcan szgoz...@gmail.com wrote: I have been trying to import a csv file to r. but I get the same message everytime. the message is Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : cannot open file 'Users:/sezginozcan/Downloads/beer.data.csv': No such file or directory I use mac. I tried this command also a-read.table(clipboard,sep=”\t”,row.names=1,header=T) Error: unexpected input in a-read.table(clipboard,sep=‚ I will appreciate if you help me before I get crazy. thank you __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Custom caret metric based on prob-predictions/rankings
Sorry for not being more clear - I'm interested in accessing these indices from within the trainControl summaryFunction, not afterward (from the train object). As for the weights, I'm referring to the weights argument passed into train. On Fri, Feb 10, 2012 at 5:50 AM, Max Kuhn mxk...@gmail.com wrote: I think you need to read the man pages and the four vignettes. A lot of your questions have answers there. If you don't specify the resampling indices, they ones generated for you are saved in the train object: data(iris) TrainData - iris[,1:4] TrainClasses - iris[,5] knnFit1 - train(TrainData, TrainClasses, + method = knn, + preProcess = c(center, scale), + tuneLength = 10, + trControl = trainControl(method = cv)) Loading required package: class Attaching package: ‘class’ The following object(s) are masked from ‘package:reshape’: condense Warning message: executing %dopar% sequentially: no parallel backend registered str(knnFit1$control$index) List of 10 $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ... $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ... $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ... $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ... $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ... $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ... $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ... $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ... $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ... $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ... There is also a savePredictions argument that gives you the hold-out results. I'm not sure which weights you are referring to. On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang yanghates...@gmail.com wrote: Actually, is there any way to get at additional information beyond the classProbs? In particular, is there any way to find out the associated weights, or otherwise the row indices into the original model matrix corresponding to the tested instances? On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote: Oops, found trainControl's classProbs right after I sent! On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote: I'm dealing with classification problems, and I'm trying to specify a custom scoring metric (recall@p, ROC, etc.) that depends on not just the class output but the probability estimates, so that caret::train can choose the optimal tuning parameters based on this metric. However, when I supply a trainControl summaryFunction, the data given to it contains only class predictions, so the only metrics possible are things like accuracy, kappa, etc. Is there any way to do this that I'm looking? If not, could I put this in as a feature request? Thanks! -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Custom caret metric based on prob-predictions/rankings
(I couldn't find answers to this question in the documentation) On Fri, Feb 10, 2012 at 11:59 AM, Yang Zhang yanghates...@gmail.com wrote: Sorry for not being more clear - I'm interested in accessing these indices from within the trainControl summaryFunction, not afterward (from the train object). As for the weights, I'm referring to the weights argument passed into train. On Fri, Feb 10, 2012 at 5:50 AM, Max Kuhn mxk...@gmail.com wrote: I think you need to read the man pages and the four vignettes. A lot of your questions have answers there. If you don't specify the resampling indices, they ones generated for you are saved in the train object: data(iris) TrainData - iris[,1:4] TrainClasses - iris[,5] knnFit1 - train(TrainData, TrainClasses, + method = knn, + preProcess = c(center, scale), + tuneLength = 10, + trControl = trainControl(method = cv)) Loading required package: class Attaching package: ‘class’ The following object(s) are masked from ‘package:reshape’: condense Warning message: executing %dopar% sequentially: no parallel backend registered str(knnFit1$control$index) List of 10 $ Fold01: int [1:135] 1 2 3 4 5 6 7 9 10 11 ... $ Fold02: int [1:135] 1 2 3 4 5 6 8 9 10 12 ... $ Fold03: int [1:135] 1 3 4 5 6 7 8 9 10 11 ... $ Fold04: int [1:135] 1 2 3 5 6 7 8 9 10 11 ... $ Fold05: int [1:135] 1 2 3 4 6 7 8 9 11 12 ... $ Fold06: int [1:135] 1 2 3 4 5 6 7 8 9 10 ... $ Fold07: int [1:135] 1 2 3 4 5 7 8 9 10 11 ... $ Fold08: int [1:135] 2 3 4 5 6 7 8 9 10 11 ... $ Fold09: int [1:135] 1 2 3 4 5 6 7 8 9 10 ... $ Fold10: int [1:135] 1 2 4 5 6 7 8 10 11 12 ... There is also a savePredictions argument that gives you the hold-out results. I'm not sure which weights you are referring to. On Fri, Feb 10, 2012 at 4:38 AM, Yang Zhang yanghates...@gmail.com wrote: Actually, is there any way to get at additional information beyond the classProbs? In particular, is there any way to find out the associated weights, or otherwise the row indices into the original model matrix corresponding to the tested instances? On Thu, Feb 9, 2012 at 4:37 PM, Yang Zhang yanghates...@gmail.com wrote: Oops, found trainControl's classProbs right after I sent! On Thu, Feb 9, 2012 at 4:30 PM, Yang Zhang yanghates...@gmail.com wrote: I'm dealing with classification problems, and I'm trying to specify a custom scoring metric (recall@p, ROC, etc.) that depends on not just the class output but the probability estimates, so that caret::train can choose the optimal tuning parameters based on this metric. However, when I supply a trainControl summaryFunction, the data given to it contains only class predictions, so the only metrics possible are things like accuracy, kappa, etc. Is there any way to do this that I'm looking? If not, could I put this in as a feature request? Thanks! -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Yang Zhang http://yz.mit.edu/ -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing an extra argument to an S3 generic
On 2/9/2012 6:24 PM, ilai wrote: You do not provide mlm.influence() so your code can't be reproduced. Or did you mean to put lm.influence() in the formals to your hatvalues.mlm ? If yes, then 1) you have a typo 2) lm.influence doesn't allow you to pass on arguments, maybe try influence.lm instead. No, I've written an S3 method, influence.mlm and a computational method, mlm.influence, both of which take an m= argument. I didn't post all the code because I thought the question might have an easy answer based on what I provided. I include all the code and a test case in the attached .txt file that can be sourced. -Michael Elai On Thu, Feb 9, 2012 at 1:42 PM, Michael Friendlyfrien...@yorku.ca wrote: I'm trying to write some functions extending influence measures to multivariate linear models and also allow subsets of size m=1 to be considered for deletion diagnostics. I'd like these to work roughly parallel to those functions for the univariate lm where only single case deletion (m=1) diagnostics are considered. Corresponding to stats::hatvalues.lm, the S3 method for class lm objects, hatvalues-function (model, ...) UseMethod(hatvalues) hatvalues.lm- function (model, infl = lm.influence(model, do.coef = FALSE),...) { hat- infl$hat names(hat)- names(infl$wt.res) hat } I have, for class mlm objects hatvalues.mlm- function(model, m=1, infl=mlm.influence(model, m=m, do.coef = FALSE), ...) { hat- infl$H m- infl$m names(hat)- if(m==1) infl$subsets else apply(infl$subsets,1, paste, collapse=',') hat } where mlm.influence() does the calculations, but also allows the m= argument to specify subset size. Yet when I test this I can't seem to pass the m= argument directly, so that it gets stuffed in to the infl= call to mlm.influence. # fit an mlm library(heplots) Rohwer2- subset(Rohwer, subset=group==2) rownames(Rohwer2)- 1:nrow(Rohwer2) Rohwer.mod- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) class(Rohwer.mod) [1] mlm lm ## this doesn't work, as I would like it to, calling the hatvalues.mlm method, but passing m=2: hatvalues(Rohwer.mod, m=2) Error in UseMethod(hatvalues) : no applicable method for 'hatvalues' applied to an object of class c('double', 'numeric') I don't understand why this doesn't just call hatvalues.mlm, since Rohwer.mod is of class mlm. # These work -- calling hatvalues.mlm explicitly, or passing the infl= argument with the call to # mlm.influence hatvalues.mlm(Rohwer.mod, m=2) hatvalues(Rohwer.mod, infl=mlm.influence(Rohwer.mod,m=2)) Can someone help me understand what is wrong and how to make the .mlm method allow m= to be passed directly to the infl= computation? thx, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA # S3 method for class mlm influence.mlm - function(model, m=1, do.coef=TRUE, ...) mlm.influence(model, m=m, do.coef = do.coef, ...) # main computation a la lm.influence mlm.influence - function (model, m=1, do.coef = TRUE, ...) { # helper functions vec - function(M) { R - matrix(M, ncol=1) if (is.vector(M)) return(R) nn-expand.grid(dimnames(M))[,2:1] rownames(R) - apply(as.matrix(nn), 1, paste, collapse=:) R } X - model.matrix(model) data - model.frame(model) Y - as.matrix(model.response(data)) r - ncol(Y) n - nrow(X) p - ncol(X) labels - rownames(X) call - model$call B - coef(model) E - residuals(model) XPXI - solve(crossprod(X)) EPEI - solve(crossprod(E)) vB - vec(t(B)) S - crossprod(E)/(n-p); V - solve(p*S) %x% crossprod(X) subsets - t(combn(n, m)) nsub - nrow(subsets) Beta - as.list(rep(0, nsub)) R - L - H - Q - as.list(rep(0, nsub)) CookD - as.vector(rep(0, nsub)) # FIXME: naive, direct computations can use qr() or update() for (i in seq(nsub)) { I - c(subsets[i,]) rows - which(!(1:n) %in% I) XI - X[rows,] YI - Y[rows,] BI - solve(crossprod(XI)) %*% t(XI) %*% YI EI - (Y - X %*% BI)[I, , drop=FALSE] CookD[i] - t(vec(B - BI)) %*% V
Re: [R] Formatting Y axis.
yaxt='n' in ?par and ?axis are your friends. # A plot on log scale labeled with original: plot(x,log(y),yaxt='n') axis(2,at=pretty(log(y)),labels=round(exp(pretty(log(y) Works for qqnorm and boxplots, as well as other top level fun. By the way this is a FAQ. On Fri, Feb 10, 2012 at 9:43 AM, sock.o mrsock...@gmail.com wrote: I've looked around and I just can't find anything that will work for my needs. This is a bit of a 2 part question but pertaining to the same topic so bare with me. The first is with my qq plot. On the Y axis of my qq plot it'll have my sample quantities but because my data is log-normal it'll show numbers between 0 - 5 (depending on the data). I'd like to know how to get it, instead of displaying 0-5 to display actual values so that way when reading it I know where my potential problems are. Would just save a lot of time and be an easier more understandable read. Here's sample code to plug in and see. x - rlnorm(400, 2.3, .8) y - rlnorm(400, 3, 1.6) plot.new() q1 - qqnorm(log1p(y)) q2 - qqnorm(log1p(x)) points( q1, col = red) points( q2, col = blue) abline( qqline(log1p(y), col = red) ) abline( qqline(log1p(x), col = blue) ) legend( bottomright, inset = 0.02, title = Loc_ID, c(Sayincom, Sayout), fill = c( red, blue), horiz = TRUE) So what I want is for the y axis. or even secondary y axis if it's lined up properly to read the actual values rather than the logged values. The second part of this is in regards to my boxplots. In order to fit all of the data in a readable manner I've been making log = y. Which is fine, but when trying to estimate the difference via notches it's difficult to estimate how far it really is based on the axis being logged. You can use the same x as above, or y, whichever floats your boat. boxplot ( x, notch = TRUE, log = y, boxwex = .50) Versus boxplot ( x, notch = TRUE, boxwex = .50) What I need from this is for the ability to zoom in on a particular location. Rather than be forced to look at things in a log format on my y axis to just be able to say, ok I have my mean at 25 and my standard deviation is 10 so I want to concentrate this boxplot between 5 and 45 to view 2 standard deviations. (That whole mean thing and standard deviations are arbitrary numbers but the concept remains) Any help would be much appreciated. If there are particular packages required for any suggestions please include their name in your response. If you need actual imaging of what the actual graphs I'm working with look like as opposed to the randomly generated values that I'm supplying then just say so and I'll post the actual graphs I'm trying to work with. Sincerely, Sock -- View this message in context: http://r.789695.n4.nabble.com/Formatting-Y-axis-tp4376843p4376843.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing an extra argument to an S3 generic
For these type of setups, I typically turn to default values, e.g. hatvalues.mlm - function(model, m=1, infl=NULL, ...) { if (is.null(infl)) { infl - mlm.influence(model, m=m, do.coef=FALSE); } hat - infl$H m - infl$m names(hat) - if(m==1) infl$subsets else apply(infl$subsets,1, paste, collapse=',') hat } So people may prefer to do the following: hatvalues.mlm - function(model, m=1, infl, ...) { if (missing(infl)) { infl - mlm.influence(model, m=m, do.coef=FALSE); } hat - infl$H m - infl$m names(hat) - if(m==1) infl$subsets else apply(infl$subsets,1, paste, collapse=',') hat } The downside with this approach is that args(fcn) doesn't reveal the default behavior; instead you need to document it clearly in the help(fcn). My $.02 /H On Fri, Feb 10, 2012 at 12:13 PM, Michael Friendly frien...@yorku.ca wrote: On 2/9/2012 6:24 PM, ilai wrote: You do not provide mlm.influence() so your code can't be reproduced. Or did you mean to put lm.influence() in the formals to your hatvalues.mlm ? If yes, then 1) you have a typo 2) lm.influence doesn't allow you to pass on arguments, maybe try influence.lm instead. No, I've written an S3 method, influence.mlm and a computational method, mlm.influence, both of which take an m= argument. I didn't post all the code because I thought the question might have an easy answer based on what I provided. I include all the code and a test case in the attached .txt file that can be sourced. -Michael Elai On Thu, Feb 9, 2012 at 1:42 PM, Michael Friendlyfrien...@yorku.ca wrote: I'm trying to write some functions extending influence measures to multivariate linear models and also allow subsets of size m=1 to be considered for deletion diagnostics. I'd like these to work roughly parallel to those functions for the univariate lm where only single case deletion (m=1) diagnostics are considered. Corresponding to stats::hatvalues.lm, the S3 method for class lm objects, hatvalues-function (model, ...) UseMethod(hatvalues) hatvalues.lm- function (model, infl = lm.influence(model, do.coef = FALSE), ...) { hat- infl$hat names(hat)- names(infl$wt.res) hat } I have, for class mlm objects hatvalues.mlm- function(model, m=1, infl=mlm.influence(model, m=m, do.coef = FALSE), ...) { hat- infl$H m- infl$m names(hat)- if(m==1) infl$subsets else apply(infl$subsets,1, paste, collapse=',') hat } where mlm.influence() does the calculations, but also allows the m= argument to specify subset size. Yet when I test this I can't seem to pass the m= argument directly, so that it gets stuffed in to the infl= call to mlm.influence. # fit an mlm library(heplots) Rohwer2- subset(Rohwer, subset=group==2) rownames(Rohwer2)- 1:nrow(Rohwer2) Rohwer.mod- lm(cbind(SAT, PPVT, Raven) ~ n+s+ns+na+ss, data=Rohwer2) class(Rohwer.mod) [1] mlm lm ## this doesn't work, as I would like it to, calling the hatvalues.mlm method, but passing m=2: hatvalues(Rohwer.mod, m=2) Error in UseMethod(hatvalues) : no applicable method for 'hatvalues' applied to an object of class c('double', 'numeric') I don't understand why this doesn't just call hatvalues.mlm, since Rohwer.mod is of class mlm. # These work -- calling hatvalues.mlm explicitly, or passing the infl= argument with the call to # mlm.influence hatvalues.mlm(Rohwer.mod, m=2) hatvalues(Rohwer.mod, infl=mlm.influence(Rohwer.mod,m=2)) Can someone help me understand what is wrong and how to make the .mlm method allow m= to be passed directly to the infl= computation? thx, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] apply pairs function to multiple columns in a data frame
I am very new to R and programming and thank you in advance for your patience and help with a complete novice! I am working with a large multivariate data set that has 10 explanatory environmental variables (e.g. temp, depth) and over 60 response variables (each is a separate species). My data frame is set up like the simplified version below: JulianDay Temperature Salinity Depth Copepod Barnacle Gastropod Bivalve 222 12.1330.3 500 756 0 178 222 12.333.2 1.1145 111 0 0 223 11.133.1 7752 234 12 0 Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod, and Bivalve are the column headers. I am using the pairs function in R to explore my data. My data frame is named Sunset. Using this code: Z=cbind(Sunset$Copepod,Sunset[,c(1:10)]) #the first 10 columns of my data frame are explanatory variables such as temp Pairs.Copepod=pairs(Z,main=Copepods vs. explanatory variables, panel=panel.smooth) I get a great pair plot of Copepods vs. all of the 10 explanatory environmental variables. I would like to do this for each of my 60+ species. I can't just make one big pair plot of all of my explanatory variables vs. all of my species because I have too many. So instead I would like a separate pair plot for each species (each column of data after the first 10 columns of explanatory variables) I would like to be able to write a loop that creates all of these plots at once, but haven't been able to do so. Ideally, each pair plot would have the main title be the column header (e.g. Copepod) for that plot. I would love to include a way to save each pair plot as separate jpeg to a folder on my desktop with a file name that includes the species name (e.g. Copepod). Again, I am very new to R and to programing so I would GREATLY appreciate anyone patient and kind enough to respond with lots of detail so I can hopefully follow your answer and suggestions. I have searched but haven't been able to find a way to write a loop that uses each column of data, so I would love some help! Thank you! Marley -- View this message in context: http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] passing an extra argument to an S3 generic
On 2/10/2012 4:09 PM, Henrik Bengtsson wrote: So people may prefer to do the following: hatvalues.mlm- function(model, m=1, infl, ...) { if (missing(infl)) { infl- mlm.influence(model, m=m, do.coef=FALSE); } hat- infl$H m- infl$m names(hat)- if(m==1) infl$subsets else apply(infl$subsets,1, paste, collapse=',') hat } Thanks; I tried exactly that, but I still can't pass m=2 to the mlm method through the generic hatvalues(Rohwer.mod) 1 2 3 4 5 6 7 8 0.16700926 0.21845327 0.14173469 0.07314341 0.56821462 0.15432157 0.04530969 0.17661104 9 10 11 12 13 14 15 16 0.05131298 0.45161152 0.14542776 0.17050399 0.10374592 0.12649927 0.33246744 0.33183461 17 18 19 20 21 22 23 24 0.17320579 0.26353864 0.29835817 0.07880597 0.14023750 0.19380286 0.04455330 0.20641708 25 26 27 28 29 30 31 32 0.15712604 0.15333879 0.36726467 0.11189754 0.30426999 0.08655434 0.08921878 0.07320950 hatvalues(Rohwer.mod, m=2) Error in UseMethod(hatvalues) : no applicable method for 'hatvalues' applied to an object of class c('double', 'numeric') ## This works: hatvalues.mlm(Rohwer.mod, m=2) ... output snipped hatvalues function (model, ...) UseMethod(hatvalues) bytecode: 0x021339e4 environment: namespace:stats -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply pairs function to multiple columns in a data frame
A good way to solve problems like this is to write a function that will work with one variable, and then use one of the apply family of functions (in this case sapply) to do it for each of your variables. In this case, such a function would be something like this: make1plot = function(var){ jpeg(paste(var,'jpg',sep='.')) pairs(Sunset[,c(var,names(Sunset)[1:10])],main=paste(var,'vs. explanatory variables'), panel=panel.smooth) dev.off() } (Notice that indexing precludes the need for cbind when you're creating a sub-matrix.) Now suppose the names of your dependent variables are stored in a vector called deps. Then sapply(deps,make1plot) should do what you want. I couldn't test this, since you didn't provide a reproducible example. If you use this list often, you'll find that providing a reproducible example goes a long way towards getting a good answer. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 10 Feb 2012, jarvisma wrote: I am very new to R and programming and thank you in advance for your patience and help with a complete novice! I am working with a large multivariate data set that has 10 explanatory environmental variables (e.g. temp, depth) and over 60 response variables (each is a separate species). My data frame is set up like the simplified version below: JulianDay Temperature Salinity Depth Copepod Barnacle Gastropod Bivalve 222 12.1330.3 500 756 0 178 222 12.333.2 1.1145 111 0 0 223 11.133.1 7752 234 12 0 Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod, and Bivalve are the column headers. I am using the pairs function in R to explore my data. My data frame is named Sunset. Using this code: Z=cbind(Sunset$Copepod,Sunset[,c(1:10)]) #the first 10 columns of my data frame are explanatory variables such as temp Pairs.Copepod=pairs(Z,main=Copepods vs. explanatory variables, panel=panel.smooth) I get a great pair plot of Copepods vs. all of the 10 explanatory environmental variables. I would like to do this for each of my 60+ species. I can't just make one big pair plot of all of my explanatory variables vs. all of my species because I have too many. So instead I would like a separate pair plot for each species (each column of data after the first 10 columns of explanatory variables) I would like to be able to write a loop that creates all of these plots at once, but haven't been able to do so. Ideally, each pair plot would have the main title be the column header (e.g. Copepod) for that plot. I would love to include a way to save each pair plot as separate jpeg to a folder on my desktop with a file name that includes the species name (e.g. Copepod). Again, I am very new to R and to programing so I would GREATLY appreciate anyone patient and kind enough to respond with lots of detail so I can hopefully follow your answer and suggestions. I have searched but haven't been able to find a way to write a loop that uses each column of data, so I would love some help! Thank you! Marley -- View this message in context: http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Table rearranging
Hi David, Totally forgot line 3! So 513 red should be ok but 420 red and 917 yellow aren't. Jeffrey CC: r-help@r-project.org From: dwinsem...@comcast.net To: johjeff...@hotmail.com Subject: Re: [R] Table rearranging Date: Tue, 7 Feb 2012 21:51:55 -0500 On Feb 7, 2012, at 8:43 PM, Jeffrey Joh wrote: Hi David, I am not sure how ddply/summarize solves my issue. I have the following table: ID measurement date door color 1 0.93529385 513 open red 2 0.97419293 420 open red 3 0.962053514 513 closed red 4 0.963909937 1230 open blue 5 0.97652034 1230 open green 6 0.989310795 1230 closed blue 7 0.9941022 917 closed yellow 8 0.8945757 1230 open blue I only want to keep the lines that have corresponding open/closed measurements. For example, I want to keep lines 4,6,8 because for the 1230 blue condition, there exists both open and closed measurements. However, the 513 red condition has an open measurement, but no closed measurement. Huh? what about line 3? -- David, Therefore, line 1 should be deleted. Jeffrey CC: r-help@r-project.org; wdun...@tibco.com From: dwinsem...@comcast.net To: johjeff...@hotmail.com Subject: Re: [R] Table rearranging Date: Tue, 7 Feb 2012 09:08:00 -0500 On Feb 7, 2012, at 4:21 AM, Jeffrey Joh wrote: Thank you for your help, Bill. From the original table (not the plyr output), I would like to remove all the lines that do not have a corresponding open/closed measurement. For example, if there is a Closed yellow measurement on 0917, but not an Open yellow 0917 measurement, then the Closed yellow should be deleted. How can I make this change? In R you need to assign the results of a function to an object name so you code would look like: modified_data - ddply(d, .(date, color), summarize, meanClosed=mean(measurement[door==closed]), nClosed=sum(door==closed)) -- David Jeffrey From: wdun...@tibco.com To: johjeff...@hotmail.com; r-help@r-project.org Subject: RE: [R] Table rearranging Date: Tue, 7 Feb 2012 00:43:25 + Install and load the plyr package and try something like: ddply(d, .(date, color), summarize, + ddply(d, .(date, color), summarize + meanClosed=mean(measurement[door==closed]), nClosed=sum(door==closed)) date color meanOpen nOpen meanClosed nClosed 1 420 red 0.9741929 1 NaN 0 2 513 red 0.9352938 1 0.9620535 1 3 917 yellow NaN 0 0.9941022 1 4 1230 blue 0.9639099 1 0.9893108 1 5 1230 green 0.9765203 1 NaN 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Jeffrey Joh Sent: Monday, February 06, 2012 4:28 PM To: r-help@r-project.org Subject: [R] Table rearranging I have a table that looks like this: measurement date door color 0.93529385 513 open red 0.97419293 420 open red 0.962053514 513 closed red 0.963909937 1230 open blue 0.97652034 1230 open green 0.989310795 1230 closed blue 0.9941022 917 closed yellow I would like to create a table that has: Open measurement, Closed measurement, date, color. For every date/color combination, there should be two columns to represent the door open/closed measurement. If there are multiple datapoints with a given door/date/color combination, then they should be averaged. I would also like to make two columns to represent the number of datapoints that were averaged in determining the open/closed measurements. Jeffrey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] stepwise variable selection with multiple dependent variables
Good Day, I fit a multivariate linear regression model with 3 dependent variables and several predictors using the lm function. I would like to use stepwise variable selection to produce a set of candidate models. However, when I pass the fitted lm object to step() I get the following error: Error from R: Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k, : no 'drop1' method for mlm models My dependent data is in the matrix ymat where ymat is 35 rows by 3 columns. The predictors are in X where X is 35 by 6 The steps I used were: m.fit - lm(ymat ~ ., data=X) m.step - step(m.fit) If variable selection is not possible with step() is there another package that will perform variable selection in a multivariate setting? System information: platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status major 2 minor 13.1 year 2011 month 07 day08 svn rev56322 language R version.string R version 2.13.1 (2011-07-08) Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Table rearranging
On Feb 10, 2012, at 4:47 PM, Jeffrey Joh wrote: Hi David, Totally forgot line 3! So 513 red should be ok but 420 red and 917 yellow aren't. I don't have a plyr solution but this is a base solution: dat[ as.logical( ave(dat$door, dat$date, FUN=function(x) {open %in% x closed %in% x} )) , ] ID measurement date door color 1 1 0.9352938 513 open red 3 3 0.9620535 513 closed red 4 4 0.9639099 1230 open blue 5 5 0.9765203 1230 open green 6 6 0.9893108 1230 closed blue 8 8 0.8945757 1230 open blue (You could have used Dunlap's resutl and tested for both the meanOpen and meanClosed values being not-is.na.) -- David Jeffrey CC: r-help@r-project.org From: dwinsem...@comcast.net To: johjeff...@hotmail.com Subject: Re: [R] Table rearranging Date: Tue, 7 Feb 2012 21:51:55 -0500 On Feb 7, 2012, at 8:43 PM, Jeffrey Joh wrote: Hi David, I am not sure how ddply/summarize solves my issue. I have the following table: ID measurement date door color 1 0.93529385 513 open red 2 0.97419293 420 open red 3 0.962053514 513 closed red 4 0.963909937 1230 open blue 5 0.97652034 1230 open green 6 0.989310795 1230 closed blue 7 0.9941022 917 closed yellow 8 0.8945757 1230 open blue I only want to keep the lines that have corresponding open/closed measurements. For example, I want to keep lines 4,6,8 because for the 1230 blue condition, there exists both open and closed measurements. However, the 513 red condition has an open measurement, but no closed measurement. Huh? what about line 3? -- David, Therefore, line 1 should be deleted. Jeffrey CC: r-help@r-project.org; wdun...@tibco.com From: dwinsem...@comcast.net To: johjeff...@hotmail.com Subject: Re: [R] Table rearranging Date: Tue, 7 Feb 2012 09:08:00 -0500 On Feb 7, 2012, at 4:21 AM, Jeffrey Joh wrote: Thank you for your help, Bill. From the original table (not the plyr output), I would like to remove all the lines that do not have a corresponding open/closed measurement. For example, if there is a Closed yellow measurement on 0917, but not an Open yellow 0917 measurement, then the Closed yellow should be deleted. How can I make this change? In R you need to assign the results of a function to an object name so you code would look like: modified_data - ddply(d, .(date, color), summarize, meanClosed=mean(measurement[door==closed]), nClosed=sum(door==closed)) -- David Jeffrey From: wdun...@tibco.com To: johjeff...@hotmail.com; r-help@r-project.org Subject: RE: [R] Table rearranging Date: Tue, 7 Feb 2012 00:43:25 + Install and load the plyr package and try something like: ddply(d, .(date, color), summarize, + ddply(d, .(date, color), summarize + meanClosed=mean(measurement[door==closed]), nClosed=sum(door==closed)) date color meanOpen nOpen meanClosed nClosed 1 420 red 0.9741929 1 NaN 0 2 513 red 0.9352938 1 0.9620535 1 3 917 yellow NaN 0 0.9941022 1 4 1230 blue 0.9639099 1 0.9893108 1 5 1230 green 0.9765203 1 NaN 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Jeffrey Joh Sent: Monday, February 06, 2012 4:28 PM To: r-help@r-project.org Subject: [R] Table rearranging I have a table that looks like this: measurement date door color 0.93529385 513 open red 0.97419293 420 open red 0.962053514 513 closed red 0.963909937 1230 open blue 0.97652034 1230 open green 0.989310795 1230 closed blue 0.9941022 917 closed yellow I would like to create a table that has: Open measurement, Closed measurement, date, color. For every date/color combination, there should be two columns to represent the door open/closed measurement. If there are multiple datapoints with a given door/date/color combination, then they should be averaged. I would also like to make two columns to represent the number of datapoints that were averaged in determining the open/closed measurements. Jeffrey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT
Re: [R] Choosing glmnet lambda values via caret
Yes, I know how to statically specify lambda choices. I was asking about whether there's a way to let glmnet guide that, instead of specifying it in advance. Again, sorry if I could've been more clear. On Thu, Feb 9, 2012 at 7:15 PM, Max Kuhn mxk...@gmail.com wrote: You can adjust the candidate set of tuning parameters via the tuneGrid argument in trian() and the process by which the optimal choice is made (via the 'selectionFunction' argument in trainControl()). Check out the package vignettes. The latest version also has an update.train() function that lets the user manually specify the tuning parameters after the call to train(). On Thu, Feb 9, 2012 at 7:00 PM, Yang Zhang yanghates...@gmail.com wrote: Usually when using raw glmnet I let the implementation choose the lambdas. However when training via caret::train the lambda values are predetermined. Is there any way to have caret defer the lambda choices to caret::train and thus choose the optimal lambda dynamically? -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Max -- Yang Zhang http://yz.mit.edu/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise variable selection with multiple dependent variables
Anova.mlm would be one way to do model selection. On Fri, Feb 10, 2012 at 4:29 PM, Fugate, Michael L fug...@lanl.gov wrote: Good Day, I fit a multivariate linear regression model with 3 dependent variables and several predictors using the lm function. I would like to use stepwise variable selection to produce a set of candidate models. However, when I pass the fitted lm object to step() I get the following error: Error from R: Error in drop1.mlm(fit, scope$drop, scale = scale, trace = trace, k = k, : no 'drop1' method for mlm models My dependent data is in the matrix ymat where ymat is 35 rows by 3 columns. The predictors are in X where X is 35 by 6 The steps I used were: m.fit - lm(ymat ~ ., data=X) m.step - step(m.fit) If variable selection is not possible with step() is there another package that will perform variable selection in a multivariate setting? System information: platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status major 2 minor 13.1 year 2011 month 07 day 08 svn rev 56322 language R version.string R version 2.13.1 (2011-07-08) Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scriptable Integration
My data: dput(mydata) structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1, 1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276, 0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485, 0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579, 0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231 )), .Names = c(V1, V2, V3), row.names = c(NA, 6L), class = data.frame) I'd like to apply an arbitrary number of integrations of the splinefunc from mydata$V2 and mydata$V3. Therefore, it should return a function. Anyone have any idea as to a package that allows this? Many thanks! -- H -- Sent from my mobile device Envoyait de mon portable __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply pmax to dataframe with different args based on dataframe factor
damn, that's pretty clever. +1 thanks On Thu, Feb 9, 2012 at 8:11 PM, ilai ke...@math.montana.edu wrote: Your attempt was just overly complicated. All you needed was threshold - c( .2 , .4 , .5 )[ df$track ] df$value - pmax(threshold, df$value) df # desired outcome Cheers On Thu, Feb 9, 2012 at 3:56 PM, Idris Raja idris.r...@gmail.com wrote: # I have a dataframe in the following form: track - c(rep('A', 3), rep('B', 4), rep('C', 4)) value - c(0.15, 0.25, 0.35, 0.05, 0.99, 0.32, 0.13, 0.80, 0.75, 0.60, 0.44) df - data.frame(track=factor(track), value=value) # print(df) #track value #1 A 0.15 #2 A 0.25 #3 A 0.35 #4 B 0.05 #5 B 0.99 #6 B 0.32 #7 B 0.13 #8 C 0.80 #9 C 0.75 #10 C 0.60 #11 C 0.44 # If any of the values are below a threshold value, I want to replace it with the # threshold value. The twist is that there is a different threshold value for # every track. # I tried something like this, but it's not working threshold - list() threshold['A'] - 0.2 threshold['B'] - 0.4 threshold['C'] - 0.5 for (track in levels(df$track)){ df[df$track==track,]$outcome - pmax(df[df$track==track,]$outcome, threshold[track]) } # Warning messages: # 1: In is.na(mmm) : is.na() applied to non-(list or vector) of type 'NULL' # 2: In is.na(mmm) : is.na() applied to non-(list or vector) of type 'NULL' # 3: In is.na(mmm) : is.na() applied to non-(list or vector) of type 'NULL' #** # Desired Results: # print(df) #track value #1 A 0.20 # value changed #2 A 0.25 #3 A 0.35 #4 B 0.40 # value changed #5 B 0.99 #6 B 0.40 # value changed #7 B 0.40 # value changed #8 C 0.80 #9 C 0.75 #10 C 0.60 #11 C 0.50 # value changed # Any ideas? Thanks for reading. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple histograms from a dataframe
Hi list I need some help for drawing some histograms I have a dataframe , say, X Y Z T I want to draw a histogram Z-T for each value of the couple (X-Y). When I use thus syntax library(lattice) histogram(law[,3] ~ law[,66] | law[,1] ) it draws multiple histograms but by selecting distinct values of law[,1] The deal is to make the same thing but for a couple of columns Thanks in advance for help Adel -- PhD candidate in Computer Science Address 3 avenue lamine, cité ezzahra, Sousse 4000 Tunisia tel: +216 97 246 706 (+33640302046 jusqu'au 15/6) fax: +216 71 391 166 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple histograms from a dataframe
On Feb 10, 2012, at 7:05 PM, Adel ESSAFI wrote: Hi list I need some help for drawing some histograms I have a dataframe , say, X Y Z T I want to draw a histogram Z-T for each value of the couple (X-Y). When I use thus syntax library(lattice) histogram(law[,3] ~ law[,66] | law[,1] ) Perhaps (but untested in the absence of data); histogram( Z ~ T | interaction(X, Y) , data=dfrmname ) it draws multiple histograms but by selecting distinct values of law[,1] The deal is to make the same thing but for a couple of columns Thanks in advance for help Adel -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] garchSim with starting error and variance given?
Hello i need to simulate data for and arch and garch process, n=1000. I've been generating arch and garch data but i can't find the way to set an starting error, epsilon(t=0) and sigma(t=0) to run the command garchSim(). I generated the process on Excel and i know that for the Arch process has a mean of 10 aprox, and for Garch a mean of 25 aprox. The errors comes from a random walk y_t=epsilon_t Thanks for your time and interest. -- View this message in context: http://r.789695.n4.nabble.com/garchSim-with-starting-error-and-variance-given-tp432p432.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scriptable Integration
I'm not quite sure what you mean, but perhaps this will help: library(splines) mydata - structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1, 1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276, 0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485, 0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579, 0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231 )), .Names = c(V1, V2, V3), row.names = c(NA, 6L), class = data.frame) spf - splinefun(mydata$V2) splInt - function(low, up) integrate(spf, low, up)$value splInt(0, 1) splInt(0, 2) Michael On Fri, Feb 10, 2012 at 6:34 PM, Hasan Diwan hasan.di...@gmail.com wrote: My data: dput(mydata) structure(list(V1 = c(1328565067, 1328565067.05, 1328565067.1, 1328565067.15, 1328565067.2, 1328565067.25), V2 = c(0.0963890795246276, 0.227296347215609, 0.240972698811569, 0.221208948983498, 0.230898231782485, 0.203282153087549), V3 = c(0.0245045248243853, 0.0835679411703579, 0.0612613120609633, 0.058568910563872, 0.0511868450318788, 0.0557714205674231 )), .Names = c(V1, V2, V3), row.names = c(NA, 6L), class = data.frame) I'd like to apply an arbitrary number of integrations of the splinefunc from mydata$V2 and mydata$V3. Therefore, it should return a function. Anyone have any idea as to a package that allows this? Many thanks! -- H -- Sent from my mobile device Envoyait de mon portable __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] issues with read.spss
Someone supplied me with an SPSS datafile that caused a buffer overflow and then a crash when reading it in R. Unfortunately I can't supply the dataset at hand and I have a hard time reproducing it with a toy example. But I found at least 2 issues that might be related. I would like to know which of these are expected behavior, and which are bugs. I reproduced it on R 2.14.1 both on Ubuntu Linux and Windows 7... Below some code. The files that are referenced in the code are available for download on http://www.stat.ucla.edu/~jeroen/spss/ #load library library(foreign) #problem one: long string variable is converted to multiple variables. x - read.spss(longstring.sav); summary(x); #4 variables?? #problem two: use.labels does not deal correctly with duplicate labels and generates a bad factor. x - read.spss(duplicate_labels.sav, use.value.labels=T); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scriptable Integration
Michael, On 10 February 2012 18:11, R. Michael Weylandt michael.weyla...@gmail.com wrote: I'm not quite sure what you mean, but perhaps this will help: spf - splinefun(mydata$V2) splInt - function(low, up) integrate(spf, low, up)$value Sort of, I'm looking to get the nth order integral, where the only thing I know about n is that it is greater than 1, but in practise will be under 4. Something like: spf - rep(splinefun(myData$V2, times=(n-1)) splInt - integrate(spf, min(myData$V2), max(myData$V3)) However, I can't seem to figure out how to get splinefun to evaluate itself. Hope that's clearer and thanks for the help! -- Sent from my mobile device Envoyait de mon portable __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply pairs function to multiple columns in a data frame
You might find the pairs2 function in the TeachingDemos package useful. On Fri, Feb 10, 2012 at 1:13 PM, jarvisma marleyjar...@gmail.com wrote: I am very new to R and programming and thank you in advance for your patience and help with a complete novice! I am working with a large multivariate data set that has 10 explanatory environmental variables (e.g. temp, depth) and over 60 response variables (each is a separate species). My data frame is set up like the simplified version below: JulianDay Temperature Salinity Depth Copepod Barnacle Gastropod Bivalve 222 12.1 33 0.3 500 756 0 178 222 12.3 33.2 1.1 145 111 0 0 223 11.1 33.1 7 752 234 12 0 Where JulianDay, Temperature, Salinity, Depth, Copepod, Barnacle, Gastropod, and Bivalve are the column headers. I am using the pairs function in R to explore my data. My data frame is named Sunset. Using this code: Z=cbind(Sunset$Copepod,Sunset[,c(1:10)]) #the first 10 columns of my data frame are explanatory variables such as temp Pairs.Copepod=pairs(Z,main=Copepods vs. explanatory variables, panel=panel.smooth) I get a great pair plot of Copepods vs. all of the 10 explanatory environmental variables. I would like to do this for each of my 60+ species. I can't just make one big pair plot of all of my explanatory variables vs. all of my species because I have too many. So instead I would like a separate pair plot for each species (each column of data after the first 10 columns of explanatory variables) I would like to be able to write a loop that creates all of these plots at once, but haven't been able to do so. Ideally, each pair plot would have the main title be the column header (e.g. Copepod) for that plot. I would love to include a way to save each pair plot as separate jpeg to a folder on my desktop with a file name that includes the species name (e.g. Copepod). Again, I am very new to R and to programing so I would GREATLY appreciate anyone patient and kind enough to respond with lots of detail so I can hopefully follow your answer and suggestions. I have searched but haven't been able to find a way to write a loop that uses each column of data, so I would love some help! Thank you! Marley -- View this message in context: http://r.789695.n4.nabble.com/apply-pairs-function-to-multiple-columns-in-a-data-frame-tp4377425p4377425.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.