[R] Generating a bivariate joint t distribution in R
Hi, I conduct a panel data estimation and obtain estimators for two of the coefficients beta1 and beta2. R tells me the mean and covariance of the distribution of (beta1, beta2). Now I would like to find the distribution of the quotient beta1/beta2, and one way to do it is to simulate via the joint distribution (beta1, beta2), where both beta1 and beta2 follow t distribution. How could we generate a joint t distrubuition in R? Thanks Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a bivariate joint t distribution in R
Dear Miao, Check require(MASS) ?mvrnorm for some ideas. HTH, Jorge.- On Wed, Apr 3, 2013 at 4:57 PM, jpm miao wrote: Hi, I conduct a panel data estimation and obtain estimators for two of the coefficients beta1 and beta2. R tells me the mean and covariance of the distribution of (beta1, beta2). Now I would like to find the distribution of the quotient beta1/beta2, and one way to do it is to simulate via the joint distribution (beta1, beta2), where both beta1 and beta2 follow t distribution. How could we generate a joint t distrubuition in R? Thanks Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matrixplot (VIM package): can I add a colour scale?
Dear R Help, I would like to know how to add a colour scale to a matrixplot please. Here is the code that I've used to construct the matrix plot: library(VIM) SpatialPlot - function(YearxBlock,plot.title){ # Years are columns, Blocks are rows in this matrix YearxBlock - as.matrix(YearxBlock) # To set margins for the plot: par(yaxt=n, oma=c(4,4,4,4),mar=c(1.5,1.5,1.5,1.5)) # Data coded according to a continuous color scheme, with lowest value light gray, maximum value as black, missing values as white: matrixplot(YearxBlock,col=c(lightgray,black,white),axes=FALSE) axis(side=1,col=black,at=c(1,4,7,10,13,16,19), labels=as.character(c(1993,1996,1999,2002,2005,2008,2011)),cex.axis=0.8) mtext(text=block,side = 2, line = 1, outer = TRUE, font = 1) title(main=plot.title,adj=0) } There may be some way to do this using the legend() function? Thank you. Regards, Ross Marriott Ph.D. Senior Research Scientist Stock Assessment and Data Analysis Department of Fisheries WA Western Australian Fisheries and Marine Research Laboratories PO Box 20, North Beach WA 6920, Australia. Phone: +61 8 9203 0201 (office); 0434604131 (mobile) Fax:+61 8 9203 0199 Web: http://www.fish.wa.gov.aublocked::blocked::blocked::BLOCKED::blocked::http://www.fish.wa.gov.au/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating a loop if to create a column of 0 and 1.
Hi Check your keyboard, your enter key must be broken. If I decrypt your message and assume that Nfiltered and Presyabs has the same length, then Presyabs[Nfiltered==0] - 0 In case you have some missing values use Presyabs[which(Nfiltered==0)] - 0 Without knowledge of structure of CPOD objects it is difficult to elaborate it further. PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Regards Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Marco A. Pérez Sent: Wednesday, April 03, 2013 1:00 AM To: r-help@r-project.org Subject: [R] creating a loop if to create a column of 0 and 1. Hi, I am new with rstudio and i have a trouble with this program. I store 17 listsCPOD-list()CPOD[[1]]- GB1ACPOD[[2]]- GB1CCPOD[[3]]- GB1DCPOD[[4]]- GB2ACPOD[[5]]- GB2BCPOD[[6]]- GB2CCPOD[[7]]- GB2DCPOD[[8]]- GB3ACPOD[[9]]- GB3CCPOD[[10]]- GB3DCPOD[[11]]- GB4ACPOD[[12]]- GB4CCPOD[[13]]- GB4DCPOD[[14]]- GB5ACPOD[[15]]- GB5BCPOD[[16]]- GB5CCPOD[[17]]- GB5D Each each file you can find a txt document that contains the columns: file, chuckend, Nfiltered, Nall, MinsOn and Presyabs. What I want to do is to create a loop for all the CPODs. If the row of the column Nfiltered is 0 then the row of the column Presyabs must be 0 if it is different than 0 then it must be 1. I create this loop without success#Creating the loop (0-1 presence, absence) for(i in 1:length(CPOD[[1]]$Presyabs)){ if (CPOD[[1]]$Nfiltered[i]1) (CPOD[[1]]$Presyabs[i]=1)} else {(CPOD[[1] ]$Presyabs[[i]]=0)} } Thank you for your prompt answer [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question: how to convert raw to numeric
I know that there is a function to convert binary data to string named rawToChar.but I wander is there any similar function for Integer and float.I need to read some binary file in integer and float data. I can do this job in this way: (as below) first convert 4 byte raw to bits then pack bits back to integer, but it not work for float,I worry about the performance. raw4 = raw_buffer[1:4] bits = rawToBits(raw4) packBits(bits, type = integer) Best wishes Really hope to get your response [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] linear model coefficients by year and industry, fitted values, residuals, panel data
Hi R-helpers, My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal. #reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) X-rnorm(50) Y-rnorm(50) Z-rnorm(50) data1-data.frame(firm1,year1,industry1,X,Y,Z) data1 colnames(data1)-c(firm,year,industry,X,Y,Z) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) X-rnorm(15) Y-rnorm(15) Z-rnorm(15) data2-data.frame(firm2,year2,industry2,X,Y,Z) data2 colnames(data2)-c(firm,year,industry,X,Y,Z) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) X-rnorm(20) Y-rnorm(20) Z-rnorm(20) data3-data.frame(firm3,year3,industry3,X,Y,Z) data3 colnames(data3)-c(firm,year,industry,X,Y,Z) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$industry,final2$year),] final3 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, to obtain the estimates of b0, b1 and b2 by industry and year (for example I need to have de b0 for industry 20 and year 2000, for industry 20 and year 2001...). Then I need to calculate the fitted values and the residuals by firm so I need to keep b0, b1 and b2 in a way that I could do something like newdata1-transform(final3,Y'=b0+b1.X+b2.Z) newdata2-transform(newdata1,residual=Y-Y') or another way to keep Y' and the residuals in a dataframe with the columns firm and year. Until now I have been doing this in very hard way and because I need to do it several times, I need your help to get an easier way. Thank you, Cecília Carmo Universidade de Aveiro Portugal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a bivariate joint t distribution in R
Le 03/04/13 07:57, jpm miao a écrit : Hi, I conduct a panel data estimation and obtain estimators for two of the coefficients beta1 and beta2. R tells me the mean and covariance of the distribution of (beta1, beta2). Now I would like to find the distribution of the quotient beta1/beta2, and one way to do it is to simulate via the joint distribution (beta1, beta2), where both beta1 and beta2 follow t distribution. How could we generate a joint t distrubuition in R? Thanks Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. library(fCopulae) ?rmvst Sincerely Marc Girondot -- __ Marc Girondot, Pr Laboratoire Ecologie, Systématique et Evolution Equipe de Conservation des Populations et des Communautés CNRS, AgroParisTech et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1.69.15.73.53 e-mail: marc.giron...@u-psud.fr Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html Skype: girondot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] qpplot.das
Hi, I am using qpplot.das to produce a probability plot of my data. Some of the data are negative values and once logged, NaN values are produced. Does anybody know, what happens these NaN values, are they just removed from the dataset before the other points are plotted? Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speedometer charts in R
On Tue, Apr 2, 2013 at 4:00 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Look at the R GoogleVis package. Or read what Hadley W had to say on a similar question first: The question would why would you want to? You are trying to understand your data, not driving a race car or aeroplane. - http://r.789695.n4.nabble.com/Graphical-output-dials-and-meters-for-a-dashboard-td845090.html But maybe you *are* creating a dashboard for an R-powered race car, in which case here's an R-native version of the google vis speedometers: http://gastonsanchez.wordpress.com/2013/01/10/gauge-chart-in-r/ Can't wait to see the full source code for your race car: require(engine) block = engine(cc=2000,cylinders=6) require(downforce) ... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speedometer charts in R
Hi. Thanks for help. In meanwhile some of contributors already send me the same link with examples. I agree with you and I am not trying to drive a race car just to do what was asked from me :) Andrija On Wed, Apr 3, 2013 at 11:24 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Tue, Apr 2, 2013 at 4:00 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Look at the R GoogleVis package. Or read what Hadley W had to say on a similar question first: The question would why would you want to? You are trying to understand your data, not driving a race car or aeroplane. - http://r.789695.n4.nabble.com/Graphical-output-dials-and-meters-for-a-dashboard-td845090.html But maybe you *are* creating a dashboard for an R-powered race car, in which case here's an R-native version of the google vis speedometers: http://gastonsanchez.wordpress.com/2013/01/10/gauge-chart-in-r/ Can't wait to see the full source code for your race car: require(engine) block = engine(cc=2000,cylinders=6) require(downforce) ... Barry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 3.0.0 is released
The build system rolled up R-3.0.0.tar.gz (codename Masked Marvel) this morning. The list below details the changes in this release. You can get the source code from http://cran.r-project.org/src/base/R-3/R-3.0.0.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team Peter Dalgaard These are the md5sums for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = cbf6da8f886ccd8d0dda0cc7ffd1b8ec MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = 43fcae6a4c96e17313d11a0aaefb73f8 MD5 (INSTALL) = 37adac6d0fbadf25b5a40e3f7535415e MD5 (NEWS) = ed5405acecb3ba4a2d9a3467bbcea7e5 MD5 (NEWS.html) = baea8a4f82a3aa9d29d1a73a34238aa1 MD5 (ONEWS) = 0c3e10eef74439786e5fceddd06dac71 MD5 (OONEWS) = b0d650eba25fc5664980528c147a20db MD5 (R-latest.tar.gz) = 5fb80535b0e144a978f67aa2158015de MD5 (README) = e259ae5dd943b8547f0b7719664e815b MD5 (RESOURCES) = c7cb32499ebbf85deb064aab282f93a4 MD5 (THANKS) = d4b45e302b7cad0fc4bb50d2cfe69649 MD5 (R-3/R-3.0.0.tar.gz) = 5fb80535b0e144a978f67aa2158015de This is the relevant part of the NEWS file CHANGES IN R 3.0.0: SIGNIFICANT USER-VISIBLE CHANGES: o Packages need to be (re-)installed under this version (3.0.0) of R. o There is a subtle change in behaviour for numeric index values 2^31 and larger. These never used to be legitimate and so were treated as NA, sometimes with a warning. They are now legal for long vectors so there is no longer a warning, and x[2^31] - y will now extend the vector on a 64-bit platform and give an error on a 32-bit one. o It is now possible for 64-bit builds to allocate amounts of memory limited only by the OS. It may be wise to use OS facilities (e.g. ulimit in a bash shell, limit in csh), to set limits on overall memory consumption of an R process, particularly in a multi-user environment. A number of packages need a limit of at least 4GB of virtual memory to load. 64-bit Windows builds of R are by default limited in memory usage to the amount of RAM installed: this limit can be changed by command-line option --max-mem-size or setting environment variable R_MAX_MEM_SIZE. o Negative numbers for colours are consistently an error: previously they were sometimes taken as transparent, sometimes mapped into the current palette and sometimes an error. NEW FEATURES: o identical() has a new argument, ignore.environment, used when comparing functions (with default FALSE as before). o There is a new option, options(CBoundsCheck=), which controls how .C() and .Fortran() pass arguments to compiled code. If true (which can be enabled by setting the environment variable R_C_BOUNDS_CHECK to yes), raw, integer, double and complex arguments are always copied, and checked for writing off either end of the array on return from the compiled code (when a second copy is made). This also checks individual elements of character vectors passed to .C(). This is not intended for routine use, but can be very helpful in finding segfaults in package code. o In layout(), the limits on the grid size have been raised (again). o New simple provideDimnames() utility function. o Where methods for length() return a double value which is representable as an integer (as often happens for package Matrix), this is converted to an integer. o Matrix indexing of dataframes by two-column numeric indices is now supported for replacement as well as extraction. o setNames() now has a default for its object argument, useful for a character result. o StructTS() has a revised additive constant in the loglik component of the result: the previous definition is returned as the loglik0 component. However, the help page has always warned of a lack of comparability of log-likelihoods for non-stationary models. (Suggested by Jouni Helske.) o The logic in aggregate.formula() has been revised. It is now possible to use a formula stored in a variable; previously, it had to be given explicitly in the function call. o install.packages() has a new argument quiet to reduce the amount of output shown. o Setting an element of the graphics argument lwd to a negative or infinite value is now an error. Lines corresponding to elements with values NA or NaN are silently omitted. Previously the behaviour was device-dependent. o Setting graphical parameters cex, col, lty, lwd and pch in par() now requires a length-one argument. Previously some silently took the first element of a longer vector, but not always when documented to do so. o Sys.which() when
[R] Better way of writing R code
Dear R forum, (Pl note this is not a finance problem) I have two data.frames as currency_df = data.frame(current_date = c(3/4/2013, 3/4/2013, 3/4/2013, 3/4/2013), issue_date = c(27/11/2012, 9/12/2012, 14/01/2013, 28/02/2013), maturity_date = c(27/04/2013, 3/5/2013, 14/6/2013, 28/06/2013), currency = c(USD, USD, GBP, SEK), other_currency = c(EURO, CAD, CHF, USD), transaction = c(Buy, Buy, Sell, Buy), units_currency = c(10, 25000, 15, 4), units_other_currency = c(78000, 25350, 99200, 6150)) rate_df = data.frame(date = c(28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013), currency = c(USD,USD,USD,USD, USD, USD, USD,USD,USD,USD, USD,USD, GBP,GBP,GBP,GBP,GBP,GBP,GBP,GBP, GBP,GBP, GBP,GBP, EURO,EURO,EURO,EURO,EURO,EURO,EURO, EURO, EURO,EURO, EURO,EURO), tenor = c(1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 weeks,1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 weeks,1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 weeks), rate = c(0.156,0.157,0.157,0.155,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752, 0.1752,0.48625, 0.485,0.48625,0.4825,0.49,0.49125,0.4925,0.49,0.49375,0.49125,0.4925, 0.49125,0.02643,0.02214, 0.02214,0.01929,0.034,0.034,0.034125,0.034,0.044,0.044, 0.041,0.045)) # ___ # 1st data.frame currency_df current_date issue_date maturity_date currency 1 3/4/2013 27/11/2012 27/04/2013 USD 2 3/4/2013 9/12/2012 3/5/2013 USD 3 3/4/2013 14/01/2013 14/6/2013 GBP 4 3/4/2013 28/02/2013 28/06/2013 SEK other_currency transaction units_currency 1 EURO Buy 10 2 CAD Buy 25000 3 CHF Sell 15 4 USD Buy 4 units_other_currency 1 78000 2 25350 3 99200 4 6150 # ... # 2nd data.frame rate_df date currency tenor rate 1 28/3/2013 USD 1 day 0.156000 2 27/3/2013 USD 1 day 0.157000 3 26/3/2013 USD 1 day 0.157000 4 25/3/2013 USD 1 day 0.155000 5 28/3/2013 USD 1 week 0.175200 6 27/3/2013 USD 1 week 0.175200 7 26/3/2013 USD 1 week 0.175200 8 25/3/2013 USD 1 week 0.175200 9 28/3/2013 USD 2 weeks 0.175200 10 27/3/2013 USD 2 weeks 0.175200 11 26/3/2013 USD 2 weeks 0.175200 12 25/3/2013 USD 2 weeks 0.175200 13 28/3/2013 GBP 1 day 0.486250 14 27/3/2013 GBP 1 day 0.485000 15 26/3/2013 GBP 1 day 0.486250 16 25/3/2013 GBP 1 day 0.482500 17 28/3/2013 GBP 1 week 0.49 18 27/3/2013 GBP 1 week 0.491250 19 26/3/2013 GBP 1 week 0.492500 20 25/3/2013 GBP 1 week 0.49 21 28/3/2013 GBP 2 weeks 0.493750 22 27/3/2013 GBP 2 weeks 0.491250 23 26/3/2013 GBP 2 weeks 0.492500 24 25/3/2013 GBP 2 weeks 0.491250 25 28/3/2013 EURO 1 day 0.026430 26 27/3/2013 EURO 1 day 0.022140 27 26/3/2013 EURO 1 day 0.022140 28 25/3/2013 EURO 1 day 0.019290 29 28/3/2013 EURO 1 week 0.034000 30 27/3/2013 EURO 1 week 0.034000 31 26/3/2013 EURO 1 week 0.034125 32 25/3/2013 EURO 1 week 0.034000 33 28/3/2013 EURO 2 weeks 0.044000 34 27/3/2013 EURO 2 weeks 0.044000 35 26/3/2013 EURO 2 weeks 0.041000 36 25/3/2013 EURO 2 weeks 0.045000 # ___ Using plyr and reshape libraries, I have converted the rate_df into tabular form as date USD_1 day USD_1 week USD_2 weeks GBP_1 day 1 25/3/2013 0.155 0.1752 0.1752 0.48250 2 26/3/2013 0.157 0.1752 0.1752 0.48625 3 27/3/2013 0.157 0.1752 0.1752 0.48500 4 28/3/2013 0.156 0.1752 0.1752 0.48625 GBP_1 week GBP_2 weeks EURO_1 day EURO_1 week 1 0.49000 0.49125 0.01929 0.034000 2 0.49250 0.49250 0.02214 0.034125 3 0.49125 0.49125 0.02214 0.034000 4 0.49000 0.49375 0.02643 0.034000 EURO_2 weeks 1 0.045 2 0.041 3 0.044 4 0.044 # __ Depending on the maturity period, I have defined discount rates as # FOR USD if (as.character(currency) == USD) { if (as.character(other_currency) == GBP days_to_maturity = 1) { libor_rate1 = df_LIBOR_rates$USD_o_n libor_rate2 = df_LIBOR_rates$GBP_o_n } else if
[R] Packages on R 3.0.0
Hello all, I see that R 3.0.0 is announced (hurray!), and have a question regarding this line in the NEWS file: Packages need to be (re-)installed under this version (3.0.0) of R. Assuming I copy my packages to the new library folder and run update.packages will it be enough? Or is there anything more to do? for example - for packages that I have which are already of the latest version - would I still need to use install.packages() on them? Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Iterative regression through a series
Hello, I've made a samll change to the code, and with your example data it's now working without errors. N - nrow(example) estim - numeric(N) # It's better to allocate the results error - numeric(N) # vectors in advance for (i in seq_len(N)) { regr - lm(example$Price[1:i] ~ example$Time[1:i]) estim[i] - coef(regr)[2] if(is.na(coef(regr)[2])) error[i] - NA else error[i] - summary(regr)$coefficients[2,2] } Hope this helps, Rui Barradas Em 03-04-2013 01:57, triskell4-umbre...@yahoo.fr escreveu: That is very helpful--I've run your code, and it works perfectly with the example data. However, I'm having some problems using my actual data--probably because my Time variable isn't actually a regular series (as it was in the example: Time=1:100). When run, it's producing estim and error vectors of lengths much greater than N. Here's a subset of my actual data: dput(example) structure(list(Time = c(3075L, 3168L, 3318L, 3410L, 3534L, 3715L, 3776L, 3926L, 3987L, 4110L, 4232L, 4291L, 4413L, 4505L, 4536L, 4656L, 4782L, 4886L, 5018L, 5138L, 5187L, 5253L, 5384L, 5540L, 5669L, 5740L, 5796L, 5887L, 5963L, 6042L, 6126L, 6197L, 6280L, 6405L, 6464L, 6553L, 6659L, 6755L, 6847L, 6917L, 7001L, 7120L, 7216L), Price = c(2.08, 3.55, 5.75, 5.69, 4.47, 5.11, 2.74, 3.04, 3.87, 4.7, 6.61, 3.95, 4.63, 7.11, 3.08, 4.476628726, 7.472854559, 8.775893276, 6.34, 5.79, 3.98889, 4.01917, 3.69, 4.603636364, 5.242094366, 6.854871699, 5.163700257, 9.154206814, 8.712059541, 10.60635248, 10.58180221, 10.55396909, 10.67812007, 9.985298266, 10.57385693, 9.644945417, 11.62, 12.615, 13.61, 10.83, 8.38, 12.7, 8.94)), .Names = c(Time, Price), row.names = c(NA, -43L), class = data.frame) Is this as simple as replacing the expression: for (i in Time) { with for (i in 1:length(Time)) { or somesuch? Mendi De : Rui Barradas ruipbarra...@sapo.pt À : triskell4-umbre...@yahoo.fr triskell4-umbre...@yahoo.fr Cc : r-help@r-project.org r-help@r-project.org Envoyé le : Mardi 2 avril 2013 11h51 Objet : Re: [R] Iterative regression through a series Hello, The error comes from NAs where you would expect coefficients. Try the following. set.seed(7511) # Make the example reproducible N - 100 Time -1:N Price - rnorm(N, 8, 2) estim - numeric(N) # It's better to allocate the results error - numeric(N) # vectors in advance for (i in Time) { regr - lm(Price[1:i] ~ Time[1:i]) estim[i] - coef(regr)[2] if(is.na(coef(regr)[2])) error[i] - NA else error[i] - summary(regr)$coefficients[2,2] } Hope this helps, Rui Barradas Em 02-04-2013 17:44, triskell4-umbre...@yahoo.fr escreveu: Hello, Some context: let's say I have a data series (let's call it PRICE, for simplicity), sample size N. I have a desire to regress that on TIME, and then to use the TIME and intercept coefficients to predict the price in the next period and to use the standard error to calculate a confidence interval. This is all very easy. However, what I need help for is to calculate a confidence interval for each point in time: imagining that at the end of the 10th period I have 10 data points, and wish to regress them on the 10 periods to create a confidence interval for the next 'predicted' price. And so on from TIME[10:100]. So the first regression would be of PRICE[1:10] on TIME[1:10], the second of PRICE[1:11] on TIME[1:11], the third of PRICE[1:11] on TIME[1:11], and so on to PRICE[1:N] and TIME[1:N]. I'd like to be able to vary the starting point (so it would need to be an argument in the function, in this case it would be 10). The ultimate output of the code would be to save the TIME coefficients and standard errors it generates to two vectors, say TIME.coef and TIME.SE. I'm not sure if lapply() can be bent to my will, or if a for loop would be too inefficient, or what. I'm not new to R, but I'm fairly new to this kind of programming. This is a bungled mess of a narrative, and I apologize. Please feel free to use TIME=1:100 and PRICE=rnorm(100,8,2). Here's an attempt, which has failed for reasons I can only imagine. Any help getting this to work would be greatly appreciated. Any help doing this without loops would be even better. Time=1:100 Price=rnorm(100,8,2) estim=0#I'm hoping this will be the Time coefficient error=0 #I'm hoping this will be the standard error of the Time coefficient for (i in Time) { +regr=lm(Price[1:i]~Time[1:i]) +estim=c(estim,coef(summary(regr))[2,1]) +error=c(error,coef(summary(regr))[2,1]) +} Error: subscript out of bounds Many, many thanks in advance. Mendi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] Packages on R 3.0.0
On 03/04/2013 11:43, Tal Galili wrote: Hello all, I see that R 3.0.0 is announced (hurray!), and have a question regarding this line in the NEWS file: Packages need to be (re-)installed under this version (3.0.0) of R. Assuming I copy my packages to the new library folder and run update.packages will it be enough? Or is there anything more to do? for No. It means what it says. You have to run update.packages(checkBuilt=TRUE) to force a re-install. Since you are the maintainer of a Windows-only package, I presume this is on Windows. In which case it is an FAQ: http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What_0027s-the-best-way-to-upgrade_003f example - for packages that I have which are already of the latest version - would I still need to use install.packages() on them? Thanks, Tal -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrixplot (VIM package): can I add a colour scale?
On 04/03/2013 04:43 PM, Ross Marriott wrote: Dear R Help, I would like to know how to add a colour scale to a matrixplot please. Here is the code that I've used to construct the matrix plot: library(VIM) SpatialPlot- function(YearxBlock,plot.title){ # Years are columns, Blocks are rows in this matrix YearxBlock- as.matrix(YearxBlock) # To set margins for the plot: par(yaxt=n, oma=c(4,4,4,4),mar=c(1.5,1.5,1.5,1.5)) # Data coded according to a continuous color scheme, with lowest value light gray, maximum value as black, missing values as white: matrixplot(YearxBlock,col=c(lightgray,black,white),axes=FALSE) axis(side=1,col=black,at=c(1,4,7,10,13,16,19), labels=as.character(c(1993,1996,1999,2002,2005,2008,2011)),cex.axis=0.8) mtext(text=block,side = 2, line = 1, outer = TRUE, font = 1) title(main=plot.title,adj=0) } Hi Ross, You can do this with the color.legend function (plotrix). Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data
Cecilia, Thanks for providing a reproducible example. Excellent. You could use the ddply() function in the plyr package to fit the model for each industry and year, keep the coefficients, and then estimate the fitted and residual values. Jean library(plyr) coef - ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z, data=dat)$coef) names(coef) - c(industry, year, b0, b1, b2) final4 - merge(final3, coef) newdata1 - transform(final4, Yhat = b0 + b1*X + b2*Z) newdata2 - transform(newdata1, residual = Y-Yhat) plot(as.factor(newdata2$firm), newdata2$residual) On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: Hi R-helpers, My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal. #reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) X-rnorm(50) Y-rnorm(50) Z-rnorm(50) data1-data.frame(firm1,year1,industry1,X,Y,Z) data1 colnames(data1)-c(firm,year,industry,X,Y,Z) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) X-rnorm(15) Y-rnorm(15) Z-rnorm(15) data2-data.frame(firm2,year2,industry2,X,Y,Z) data2 colnames(data2)-c(firm,year,industry,X,Y,Z) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) X-rnorm(20) Y-rnorm(20) Z-rnorm(20) data3-data.frame(firm3,year3,industry3,X,Y,Z) data3 colnames(data3)-c(firm,year,industry,X,Y,Z) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$industry,final2$year),] final3 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, to obtain the estimates of b0, b1 and b2 by industry and year (for example I need to have de b0 for industry 20 and year 2000, for industry 20 and year 2001...). Then I need to calculate the fitted values and the residuals by firm so I need to keep b0, b1 and b2 in a way that I could do something like newdata1-transform(final3,Y'=b0+b1.X+b2.Z) newdata2-transform(newdata1,residual=Y-Y') or another way to keep Y' and the residuals in a dataframe with the columns firm and year. Until now I have been doing this in very hard way and because I need to do it several times, I need your help to get an easier way. Thank you, Cecília Carmo Universidade de Aveiro Portugal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Canadian politcal party colours in ggplot2
A stupid question but does anyone know how to express the actual colours used by the main Canadian political parties? I want to do a couple of ggplot2 plots and have lines or rectangles that accurately reflect the party colours. I can probably play around with RColorBrewer or something to figure it out but if some some already has got them it would save me some time especially with the NDP orange. Thanks John Kane Kingston ON Canada FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Better way of writing R code
Katherine, You don't need to convert rate_df into tabular form. You just need to categorize each row in currency_df into a tenor. Then you can merge the two data frames (by currency and tenor). For example ... # convert dates to R dates, to calculate the number of days to maturity # I am assuming this is the number of days from the current date to the maturity date currency_df$maturity - as.Date(currency_df$maturity_date, %d/%m/%Y) currency_df$current - as.Date(currency_df$current_date, %d/%m/%Y) currency_df$days2mature - as.numeric(currency_df$maturity - currency_df$current) # categorize the number of days to maturity as you wish # you may need to change the breaks= option to suit your needs # read about the cut function to make sure you get the cut points included in the proper category, ?cut currency_df$tenor - cut(currency_df$days2mature, breaks=c(0, 1, 7, 14, seq(from=30.5, length=12, by=30.5)), labels=c(1 day, 1 week, 2 weeks, 1 month, paste(2:12, months))) # merge the currency_df and rate_df # this will work better with real data, since the example data you provided didn't have matching tenors both - merge(currency_df, rate_df, all.x=TRUE) Jean On Wed, Apr 3, 2013 at 5:21 AM, Katherine Gobin katherine_go...@yahoo.comwrote: Dear R forum, (Pl note this is not a finance problem) I have two data.frames as currency_df = data.frame(current_date = c(3/4/2013, 3/4/2013, 3/4/2013, 3/4/2013), issue_date = c(27/11/2012, 9/12/2012, 14/01/2013, 28/02/2013), maturity_date = c(27/04/2013, 3/5/2013, 14/6/2013, 28/06/2013), currency = c(USD, USD, GBP, SEK), other_currency = c(EURO, CAD, CHF, USD), transaction = c(Buy, Buy, Sell, Buy), units_currency = c(10, 25000, 15, 4), units_other_currency = c(78000, 25350, 99200, 6150)) rate_df = data.frame(date = c(28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013,28/3/2013,27/3/2013,26/3/2013, 25/3/2013,28/3/2013,27/3/2013,26/3/2013,25/3/2013), currency = c(USD,USD,USD,USD, USD, USD, USD,USD,USD,USD, USD,USD, GBP,GBP,GBP,GBP,GBP,GBP,GBP,GBP, GBP,GBP, GBP,GBP, EURO,EURO,EURO,EURO,EURO,EURO,EURO, EURO, EURO,EURO, EURO,EURO), tenor = c(1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 weeks,1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 weeks,1 day,1 day,1 day,1 day,1 week,1 week,1 week,1 week,2 weeks,2 weeks,2 weeks,2 weeks), rate = c(0.156,0.157,0.157,0.155,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752,0.1752, 0.1752,0.48625, 0.485,0.48625,0.4825,0.49,0.49125,0.4925,0.49,0.49375,0.49125,0.4925, 0.49125,0.02643,0.02214, 0.02214,0.01929,0.034,0.034,0.034125,0.034,0.044,0.044, 0.041,0.045)) # ___ # 1st data.frame currency_df current_date issue_date maturity_date currency 1 3/4/2013 27/11/201227/04/2013 USD 2 3/4/2013 9/12/2012 3/5/2013 USD 3 3/4/2013 14/01/2013 14/6/2013 GBP 4 3/4/2013 28/02/201328/06/2013 SEK other_currency transaction units_currency 1 EURO Buy 10 2CAD Buy 25000 3CHFSell 15 4USD Buy 4 units_other_currency 178000 2 25350 399200 4 6150 # ... # 2nd data.frame rate_df date currency tenor rate 1 28/3/2013 USD 1 day 0.156000 2 27/3/2013 USD 1 day 0.157000 3 26/3/2013 USD 1 day 0.157000 4 25/3/2013 USD 1 day 0.155000 5 28/3/2013 USD 1 week 0.175200 6 27/3/2013 USD 1 week 0.175200 7 26/3/2013 USD 1 week 0.175200 8 25/3/2013 USD 1 week 0.175200 9 28/3/2013 USD 2 weeks 0.175200 10 27/3/2013 USD 2 weeks 0.175200 11 26/3/2013 USD 2 weeks 0.175200 12 25/3/2013 USD 2 weeks 0.175200 13 28/3/2013 GBP 1 day 0.486250 14 27/3/2013 GBP 1 day 0.485000 15 26/3/2013 GBP 1 day 0.486250 16 25/3/2013 GBP 1 day 0.482500 17 28/3/2013 GBP 1 week 0.49 18 27/3/2013 GBP 1 week 0.491250 19 26/3/2013 GBP 1 week 0.492500 20 25/3/2013 GBP 1 week 0.49 21 28/3/2013 GBP 2 weeks 0.493750 22 27/3/2013 GBP 2 weeks 0.491250 23 26/3/2013 GBP 2 weeks 0.492500 24 25/3/2013 GBP 2 weeks 0.491250 25 28/3/2013 EURO 1 day 0.026430 26 27/3/2013 EURO 1 day 0.022140 27 26/3/2013 EURO 1 day 0.022140 28 25/3/2013 EURO 1 day 0.019290 29 28/3/2013 EURO 1 week 0.034000 30 27/3/2013 EURO 1 week 0.034000 31 26/3/2013 EURO 1 week 0.034125
Re: [R] Canadian politcal party colours in ggplot2
Hi John, How about this: library(XML) party.info - readHTMLTable(http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Political_parties_and_politicians_in_Canada/list_of_parties;) fed.party.info - party.info[[3]] fed.party.colors - fed.party.info[, 2] names(fed.party.colors) - gsub(^.*\\|, , fed.party.info[, 4]) tmp - data.frame(x=1:15, y=1:15, z=factor(rep(c(Canada Party, NDP, Socialist), each=5))) ggplot(tmp, aes(x=x, y=y)) + geom_point(aes(color=z)) + scale_color_manual(values = fed.party.colors) Best, Ista On Wed, Apr 3, 2013 at 9:08 AM, John Kane jrkrid...@inbox.com wrote: A stupid question but does anyone know how to express the actual colours used by the main Canadian political parties? I want to do a couple of ggplot2 plots and have lines or rectangles that accurately reflect the party colours. I can probably play around with RColorBrewer or something to figure it out but if some some already has got them it would save me some time especially with the NDP orange. Thanks John Kane Kingston ON Canada FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer, p-values and all that
My apologies for coming late into this conversation, but I'm curious about something in your response You use the following code to peform a likelihood ratio test between an lm object and a mer object fm0 - lm(distance~age,data=Orthodont) fm2 - lmer(distance~age+(1|Subject),data=Orthodont,REML=FALSE) ddiff - -2*c(logLik(fm0)-logLik(fm2)) pchisq(ddiff,1,lower.tail=FALSE) It seems like this would be simple to roll up into a function, such as lrtestGeneric - function(fit1, fit2){ chisq - -2 * c(logLik(fit1) - logLik(fit2)) df - abs(attributes(logLik(fit1))$df - attributes(logLik(fit2))$df) p - pchisq(chisq, df, lower.tail=FALSE) lrtest - data.frame(L.R.Chisq=chisq, d.f.=df, P=p) return(lrtest) } lrtestGeneric(fm0, fm2) My question is about whether it is appropriate to use the degrees of freedom returned by logLik or if I should just use 1 degree of freedom when comparing a model without the random effect to one with the random effect. For instance, logLik returns a difference of 3 between degrees of freedom in the models. Should I be using the 3 degrees of freedom in the likelihood ratio test, or is it better to go with 1? fit0 - lm(Reaction ~ Days, sleepstudy) fit1 - lmer(Reaction ~ Days + (Days|Subject), sleepstudy) lrtestGeneric(fit0, fit2) Any education you can provide would be great appreciated. Thanks Benjamin Nutter | Biostatistician | Quantitative Health Sciences Cleveland Clinic | 9500 Euclid Ave. | Cleveland, OH 44195 | (216) 445-1365 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ben Bolker Sent: Wednesday, March 27, 2013 10:34 PM To: David Winsemius Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] lmer, p-values and all that On 13-03-27 10:10 PM, David Winsemius wrote: On Mar 27, 2013, at 7:00 PM, Ben Bolker wrote: Michael Grant michael.grant at colorado.edu writes: Dear Help: I am trying to follow Professor Bates' recommendation, quoted by Professor Crawley in The R Book, p629, to determine whether I should model data using the 'plain old' lm function or the mixed model function lmer by using the syntax anova(lmModel,lmerModel). Apparently I've not understood the recommendation or the proper likelihood ratio test in question (or both) for I get this error message: Error: $ operator not defined for this S4 class. I don't have the R Book handy (some more context would be extremely useful! I would think it would count as fair use to quote the passage you're referring to ...) This is the quoted Rhelp entry: http://tolstoy.newcastle.edu.au/R/help/05/01/10006.html (I'm unable to determine whether it applies to the question at hand.) OK, I misspoke -- sorry. I think the lmer()/lme() likelihoods are actually comparable; it's GLMMs (glmer(), with no analogue in lme()-land except for MASS::glmmPQL, which doesn't give you log-likelihoods at all) where the problem arises. You can (1) use lme(), (2) look at http://glmm.wikidot.com/faq for suggestions about testing random-effects terms (including perhaps don't do it at all), or (3) construct the likelihood ratio test yourself as follows: library(nlme) data(Orthodont) fm1 - lme(distance~age,random=~1|Subject,data=Orthodont) fm0 - lm(distance~age,data=Orthodont) anova(fm1,fm0)[[p-value]] detach(package:nlme,unload=TRUE) library(lme4.0) ## stable version of lme4 fm2 - lmer(distance~age+(1|Subject),data=Orthodont,REML=FALSE) anova(fm2,fm0) ## fails ddiff - -2*c(logLik(fm0)-logLik(fm2)) pchisq(ddiff,1,lower.tail=FALSE) ## not identical to above but close enough Would someone be kind enough to point out my blunder? You should probably repost this to the r-sig-mixed-mod...@r-project.org mailing list. My short answer would be: (1) I don't think you can actually use anova() to compare likelihoods between lm() and lme()/lmer() fits in the way that you want: *maybe* for lme() [don't recall], but almost certainly not for lmer(). See http://glmm.wikidot.com/faq for methods for testing significance/inclusion of random factors (short answer: you should *generally* try to make the decision whether to include random factors or not on _a priori_ grounds, not on the basis of statistical tests ...) Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. === Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News World Report (2012). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use ...{{dropped:18}}
Re: [R] Canadian politcal party colours in ggplot2
On 03/04/2013 9:08 AM, John Kane wrote: A stupid question but does anyone know how to express the actual colours used by the main Canadian political parties? I want to do a couple of ggplot2 plots and have lines or rectangles that accurately reflect the party colours. I can probably play around with RColorBrewer or something to figure it out but if some some already has got them it would save me some time especially with the NDP orange. From this page http://www.ndp.ca/logos, NDP orange is CMYK=(0,60,100,0). According to an online conversion tool, that's #FF6600 in #RGB notation. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] arrayInd and which
Folks, I have Googled but not found much regarding arrayInd aside from the which help page. Any good examples or docs on what arrayInd does that is better or different from which()? In addition take the following 20x10 matrix: td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L )) I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is my bounds are by column. Is there a better way to do this other than: bounds-c(rep(5,5), rep(4,5)) idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE) idxs row col [1,] 1 1 [2,] 13 1 [3,] 13 2 [4,] 1 3 [5,] 8 3 [6,] 13 3 [7,] 1 4 [8,] 13 4 [9,] 1 5 [10,] 13 5 [11,] 1 6 [12,] 4 6 [13,] 13 6 [14,] 4 7 [15,] 13 7 [16,] 1 8 [17,] 4 8 [18,] 13 8 [19,] 3 9 [20,] 1 10 [21,] 13 10 Lastly can you explain these results: td[idxs[10,]] [1] 4 6 td[idxs[10,1]] [1] 4 td[idxs[10,2]] [1] 6 td[idxs[10,3]] Error: subscript out of bounds Thanks in advance for your help, KW -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Generating a bivariate joint t distribution in R
On Apr 2, 2013, at 11:07 PM, Jorge I Velez wrote: Dear Miao, Check require(MASS) ?mvrnorm Also package mvtnorm has both Normal and t versions of its p,q, r funtions for some ideas. HTH, Jorge.- On Wed, Apr 3, 2013 at 4:57 PM, jpm miao wrote: Hi, I conduct a panel data estimation and obtain estimators for two of the coefficients beta1 and beta2. R tells me the mean and covariance of the distribution of (beta1, beta2). Now I would like to find the distribution of the quotient beta1/beta2, and one way to do it is to simulate via the joint distribution (beta1, beta2), where both beta1 and beta2 follow t distribution. How could we generate a joint t distrubuition in R? Thanks Miao David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Superscript
Hi, How do I write a superscript within gsub? I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1]) Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question: how to convert raw to numeric
See ?readBin - works also with raw objects. Henrik On Apr 3, 2013 1:18 AM, Mike Chen chenminyi1...@gmail.com wrote: I know that there is a function to convert binary data to string named rawToChar.but I wander is there any similar function for Integer and float.I need to read some binary file in integer and float data. I can do this job in this way: (as below) first convert 4 byte raw to bits then pack bits back to integer, but it not work for float,I worry about the performance. raw4 = raw_buffer[1:4] bits = rawToBits(raw4) packBits(bits, type = integer) Best wishes Really hope to get your response [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript
On 03/04/2013 11:01 AM, Shane Carey wrote: Hi, How do I write a superscript within gsub? I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1]) gsub() doesn't work with expressions, it works with character strings. You're going to need to split your string into parts before and after the _mgkg string, and then put it together again as an expression. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript
Ok thanks On Wed, Apr 3, 2013 at 4:15 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 03/04/2013 11:01 AM, Shane Carey wrote: Hi, How do I write a superscript within gsub? I have the following: gsub(_mgkg,expression(paste(**mg kg^{-1})),names[1]) gsub() doesn't work with expressions, it works with character strings. You're going to need to split your string into parts before and after the _mgkg string, and then put it together again as an expression. Duncan Murdoch -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript
gsub searches strings, not expressions. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Shane Carey careys...@gmail.com wrote: Hi, How do I write a superscript within gsub? I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1]) Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] arrayInd and which
KW, I don't know anything about arrayInd, so I can't help with that question. In order to determine if there is a better way to do your comparison depends largely on what you want to do with the results. The way that you're doing it now seems fine to me, but I wonder what you want to use the indexes for. You could approach the issue from another angle, which preserves the original matrix structure. compare - cbind(td[, 1:5]=5, td[, 6:10]=4) compare td[compare] td2 - td td2[!compare] - NA td2 You are having difficulty extracting. The information that you are providing are all vectors and scalars. td[idxs[10,]] td[c(13, 5)] td[idxs[10,1]] td[13] If you want to extract by row and column, you have to provide matrices. td[idxs[10, , drop=FALSE]] td[t(matrix(c(13, 5)))] The error in your last attempt is caused by trying to extract row 10 column 3 from a matrix (idxs) that has only 2 columns. td[idxs[10,3]] idxs[10,3] dim(idxs) Hope this helps. Jean On Wed, Apr 3, 2013 at 9:53 AM, Keith S Weintraub kw1...@gmail.com wrote: Folks, I have Googled but not found much regarding arrayInd aside from the which help page. Any good examples or docs on what arrayInd does that is better or different from which()? In addition take the following 20x10 matrix: td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L )) I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is my bounds are by column. Is there a better way to do this other than: bounds-c(rep(5,5), rep(4,5)) idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE) idxs row col [1,] 1 1 [2,] 13 1 [3,] 13 2 [4,] 1 3 [5,] 8 3 [6,] 13 3 [7,] 1 4 [8,] 13 4 [9,] 1 5 [10,] 13 5 [11,] 1 6 [12,] 4 6 [13,] 13 6 [14,] 4 7 [15,] 13 7 [16,] 1 8 [17,] 4 8 [18,] 13 8 [19,] 3 9 [20,] 1 10 [21,] 13 10 Lastly can you explain these results: td[idxs[10,]] [1] 4 6 td[idxs[10,1]] [1] 4 td[idxs[10,2]] [1] 6 td[idxs[10,3]] Error: subscript out of bounds Thanks in advance for your help, KW -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Process substitution and read.table/scan
Hello, I did the same question on stackoverflow (http://stackoverflow.com/questions/15784373/process-substitution) but did not understand completely the issue so I'm reporting it here: I've given a look around about what puzzles me and I only found this: http://stackoverflow.com/questions/4274171/do-some-programs-not-accept-process-substitution-for-input-files which is partially helping, but I really would like to understand the full story. I noticed that some of my R scripts give different (ie. wrong) results when I use process substitution. I tried to pinpoint the problem with a test case: This script: #!/usr/bin/Rscript args - commandArgs(TRUE) file -args[1] cat(file) cat(\n) data - read.table(file, header=F) cat(mean(data$V1)) cat(\n) with an input file generated in this way: $ for i in `seq 1 10`; do echo $i p; done $ for i in `seq 1 500`; do cat p test; done leads me to this: $ ./mean.R test test 5.5 $ ./mean.R (cat test) /dev/fd/63 5.501476 Further tests reveal that some lines are lost...but I would like to understand why. Does read.table (scan gives the same results) uses seek? Ps. with a smaller test file (100) an error is reported: $./mean.R (cat test3) /dev/fd/63 Error in read.table(file, header = F) : no lines available in input Execution halted Other notes: with a modified script that uses scan the results are the same. Printing the whole data.frame results in 5001 lines in the first case (which is correct) and only 3050 with the process redirection. I checked read.table source code and I saw that it goes around in the file to check for column types and so on...I thought that this was an explanation for this problem but I would prefer an error message reported instead than a result gotten from partial data...then someone on stackoverflow pointed me to fifo() which solves the problem (i.e the mean is reported correctly even with the process redirection) and therefore I'm even more puzzled: does fifo() allows seeks and peeks around a named pipe? I'm willing to read the relevant code to understand what's really happening (and even help if someone thinks that this issue could represent a small bug) but I would really appreciate some pointers. Here the sessionInfo() and other possibly relevant things: sessionInfo() R version 3.0.0 beta (2013-03-23 r62384) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8 [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8 [7] LC_PAPER=CLC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base $ uname -a Linux femto 3.6-trunk-amd64 #1 SMP Debian 3.6.9-1~experimental.1 x86_64 GNU/Linux I use the debian R package: r-base-core, 3.0.0~20130324-1 Thanks, Elena Grassi ps. I started on R-help as long as this could be of general interest, sorry if that's a bad call. -- $ pom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scanning data in R
Dear R-user, May I seek your suggestion. I have a data file 'stop' to be scanned in R. But I want to ignore one specific number '21' there. Putting differently, I want to get all numbers in the file except 21. Is there any command to achieve it? -- b-scan(F:\\stop.txt) -- Many thanks for your kind attention. Regards, Jamil. 15 15 16 15 17 16 16 15 17 15 16 17 19 17 19 15 16 16 19 18 19 17 17 19 16 17 16 20 15 16 16 15 18 19 17 18 16 18 16 17 21 21 17 15 15 20 16 20 17 21 19 15 15 15 18 20 16 17 16 15 16 15 16 16 18 16 15 21 20 15 16 17 18 18 20 17 18 18 21 16 15 16 20 17 19 17 18 19 17 21 21 16 15 19 21 15 20 20 16 16 20 18 21 21 15 15 19 15 16 15 17 15 21 20 16 15 16 16 18 17 16 18 16 16 16 21 16 15 20 16 19 17 14 17 15 16 15 17 16 15 15 20 15 16 15 15 16 15 20 19 15 15 17 17 15 18 16 15 14 21 19 20 15 17 15 15 17 19 15 15 16 18 17 16 16 16 16 18 21 15 16 16 21 19 17 15 16 15 18 16 20 17 16 16 21 15 17 20 21 16 16 17 21 16 16 20 19 17 15 16 18 16 16 15 16 18 19 16 15 21 20 19 20 20 20 16 18 16 15 20 16 16 18 19 16 18 21 16 21 19 19 17 18 15 18 16 19 17 20 20 16 18 16 19 20 21 15 16 18 19 16 17 16 15 17 16 15 21 16 15 18 18 15 21 19 16 19 15 15 17 18 17 16 15 16 18 18 19 17 16 21 16 19 16 15 16 20 17 17 21 21 20 16 17 19 16 15 19 18 17 15 15 15 19 20 18 21 16 15 15 20 20 17 19 15 19 16 15 15 19 16 16 19 15 15 19 18 21 15 15 18 21 16 16 17 17 16 15 18 16 15 20 17 20 15 16 15 15 20 15 16 17 16 17 16 21 17 18 17 15 20 18 16 17 16 17 15 15 17 20 17 18 15 16 20 16 15 15 18 16 18 16 18 15 16 18 15 17 18 18 16 15 19 16 19 21 15 20 17 18 15 16 19 17 16 18 18 15 19 15 15 17 15 19 19 16 20 16 16 15 19 15 16 15 16 15 15 17 20 15 20 16 20 15 17 16 15 16 16 16 17 16 16 21 21 15 15 17 16 16 19 20 16 16 16 16 15 16 16 20 16 16 20 18 19 15 21 16 15 15 21 20 20 19 17
[R] (no subject)
Hello, I want to perform a latent class analysis using poLCA package. My formula is: substances - cbind(subs1, subs2, subs3, subs4, subs5, subs6) ~ gender+age+education+income+occupation+urban+dbehavior+incarceration+treatment+depression+alcriteria I want to include sample weights in the model, I have read that poLCA does not take into account weights, but when I introduce them, it seems that the model is running correctly. This is the command I am using: lca2 - poLCA(substances, ena, nclass=2, graphs=TRUE, maxiter=2000, (weights=p_adicc)) my question is, if the results are really taking into account the weights ? if not, how can I introduce weights into my analysis? Thank you for your help. Regards, Ana __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] install packages and time-out
I use R / RStudio at work. Recently, I tried to download XLConnect package using install.packages(XLConnect) However, I got the following error message: Installing package(s) into 'WORKCOMPUTER SPECIFIC STUFF HERE Documents/R/win-library/2.15' (as 'lib' is unspecified) trying URL 'http://www.stats.bris.ac.uk/R/bin/windows/contrib/2.15/XLConnect_0.2-5.zip' Warning in install.packages : InternetOpenUrl failed: 'The operation timed out' Error in download.file(url, destfile, method, mode = wb, ...) : cannot open URL 'http://www.stats.bris.ac.uk/R/bin/windows/contrib/2.15/XLConnect_0.2-5.zip' Warning in install.packages : download of package 'XLConnect' failed I think the reason for the problem is that I'm in a corporate environment and our firewall spends 2-3 mins scanning the zip file before releasing it. I think the solution requires me to tell R to override (what I believe is) the 60 second time-out default setting when you run install.packages. At least I think install.packages calls download.file I need to extend the time-out window, so that the firewall can do its thing. Some notes, which I think are relevant: Ø Although I'm working in a corporate environment, I have worked with my IT team to ensure that HTTP/proxy measures allow on the fly package download, e.g. no problems with install.packages(ggplot2) Ø I've also tried using setInternet2(TRUE) before the above, which made no difference Ø I've also tried using: options(timeout=300) before the install.packages command, in order to attempt to force a five-minute window ; this had no affect Ø The target file does exist on Bristol's servers if you search for the website using internet explorer Ø I can download the package manually (although our security measures spend some time scanning this particular package). Once saved locally, I successfully run install.packages to load the file from the local drive, but that's hardly the way forward if I want to keep things up to date! Ø I have tried Imperial College's server, which also fails, so it's not a mirror issue. Ø I have run the same code at home to the same Bristol mirror and get no issues. A bit stuck, therefore! ** Atrium Underwriters Ltd is authorised and regulated by the Financial Services Authority. Atrium Insurance Agency Ltd is authorised and regulated by the Financial Services Authority. Atrium Insurance Agency (Asia) Pte. Ltd. is authorised and regulated by the Monetary Authority of Singapore. The information in this email, and in any of its attachments, is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email, and to any of its attachments, by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. If you have received this email in error please notify us immediately (by telephone on +44 (0)20 7327 4877 or by return email) and destroy the message, together with any attachments and all copies in your possession. Any views expressed in this email are not necessarily those of Atrium Underwriting Group Ltd or any of its subsidiaries. Atrium Underwriting Group Ltd, Room 790, Lloyd's, 1 Lime Street, London EC3M 7DQ. Registered in England No. 2860390. Atrium Insurance Agency Ltd, Room 790, Lloyd's, 1 Lime Street, London EC3M 7DQ. Registered in England No. 5993519. Atrium Underwriters Ltd, Room 790, Lloyd's, 1 Lime Street, London EC3M 7DQ. Registered in England No. 1958863 Registered Office as above This footnote also confirms that this email message has been swept by MIMECast for the presence of computer viruses. ** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] transformation maximizing the Kurtosis
Hello, I have multivariate data - matrix X with n rows and p columns. I want to do a linear transformation V=XA similar to PCA but maximizing the Kurtosis instead of the variance. The purpose is to identify potential outliers. I have seen this paper (section 3.1) http://halweb.uc3m.es/esp/Personal/personas/dpena/articles/finaljasa.pdf but the result didnt give me orthogonal components. Thank you Ronen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kernlab::kkmeans initial centers
I am not asking about k-means. I am asking about passing initial assignments to the kernel k means algorithm. In kernel k-means, centroids are not defined explicitly. I tried passing initial centroids in the original feature space though. But, it did not work. The provided example just sets the number of clusters and lets the method assigns random cluster memberships. I am also wondering about the output centers. Are they the centers in the eigenvectors space? If so, what kind of initial centers I am supposed to pass to the method? --ahmed On Wed, Apr 3, 2013 at 12:54 AM, Pascal Oettli kri...@ymail.com wrote: Hi, I would say that if you know what k-means algorithm is, you know the meaning of initial cluster centers. You can also check the output of the provided example. Regards, Pascal On 04/03/2013 09:27 AM, Ahmed Elgohary wrote: Hi, I am trying to pass initial cluster assignments to the kkmeans methodhttp://rss.acs.unt.edu/**Rdoc/library/kernlab/html/**kkmeans.htmlhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/kkmeans.html of kernlab. It is not clear to me how I can set the parameter *centers* with initial cluster centers as stated in the documentation? thanks, --ahmed [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kernlab::kkmeans initial centers
Hi, The output of the provided example shows you which kind of matrix you need for cluster centers. In the example, 3 centers per variable (4) are defined. Thus, the initial cluster centers are defined by a 3x4 matrix. HTH, Pascal De : Ahmed Elgohary aagoh...@gmail.com À : r-help@r-project.org Cc : Pascal Oettli kri...@ymail.com Envoyé le : Mercredi 3 avril 2013 22h53 Objet : Re: [R] kernlab::kkmeans initial centers I am not asking about k-means. I am asking about passing initial assignments to the kernel k means algorithm. In kernel k-means, centroids are not defined explicitly. I tried passing initial centroids in the original feature space though. But, it did not work. The provided example just sets the number of clusters and lets the method assigns random cluster memberships. I am also wondering about the output centers. Are they the centers in the eigenvectors space? If so, what kind of initial centers I am supposed to pass to the method? --ahmed On Wed, Apr 3, 2013 at 12:54 AM, Pascal Oettli kri...@ymail.com wrote: Hi, I would say that if you know what k-means algorithm is, you know the meaning of initial cluster centers. You can also check the output of the provided example. Regards, Pascal On 04/03/2013 09:27 AM, Ahmed Elgohary wrote: Hi, I am trying to pass initial cluster assignments to the kkmeans methodhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/kkmeans.htmlof kernlab. It is not clear to me how I can set the parameter *centers* with initial cluster centers as stated in the documentation? thanks, --ahmed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Ask help
Hello! I am eager to learn if I only have a data about the relationship between otu and sample, could I use the function capscale, and final make a point plot that x-axis is CAP1 and y-axis is CAP2? Besides, what function could I use to make the different rarefaction curve with different color in the a plot in R? Appreciate very much. Sincerely! Echo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kernlab::kkmeans initial centers
thanks for your reply but, I am still confused what kind of centroids the 3x4 matrix represents. In fact, I passed that 3x4 centers matrix as the initial centers to kkmeans method and I am getting the error Error in crossprod(x[vgr[[i]], vgr[[i]], drop = FALSE], w[vgr[[i]]]) : object 'vgr' not found --ahmed On Wed, Apr 3, 2013 at 10:30 AM, Pascal Oettli kri...@ymail.com wrote: Hi, The output of the provided example shows you which kind of matrix you need for cluster centers. In the example, 3 centers per variable (4) are defined. Thus, the initial cluster centers are defined by a 3x4 matrix. HTH, Pascal -- *De :* Ahmed Elgohary aagoh...@gmail.com *À :* r-help@r-project.org *Cc :* Pascal Oettli kri...@ymail.com *Envoyé le :* Mercredi 3 avril 2013 22h53 *Objet :* Re: [R] kernlab::kkmeans initial centers I am not asking about k-means. I am asking about passing initial assignments to the kernel k means algorithm. In kernel k-means, centroids are not defined explicitly. I tried passing initial centroids in the original feature space though. But, it did not work. The provided example just sets the number of clusters and lets the method assigns random cluster memberships. I am also wondering about the output centers. Are they the centers in the eigenvectors space? If so, what kind of initial centers I am supposed to pass to the method? --ahmed On Wed, Apr 3, 2013 at 12:54 AM, Pascal Oettli kri...@ymail.com wrote: Hi, I would say that if you know what k-means algorithm is, you know the meaning of initial cluster centers. You can also check the output of the provided example. Regards, Pascal On 04/03/2013 09:27 AM, Ahmed Elgohary wrote: Hi, I am trying to pass initial cluster assignments to the kkmeans methodhttp://rss.acs.unt.edu/**Rdoc/library/kernlab/html/**kkmeans.htmlhttp://rss.acs.unt.edu/Rdoc/library/kernlab/html/kkmeans.html of kernlab. It is not clear to me how I can set the parameter *centers* with initial cluster centers as stated in the documentation? thanks, --ahmed [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scanning data in R
You can certainly do it after scanning all the numbers in with b - scan(F:\\stop.txt, what=integer()) b - b[b!=21] -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Naser Jamil Sent: Wednesday, April 03, 2013 4:33 AM To: R help Subject: [R] scanning data in R Dear R-user, May I seek your suggestion. I have a data file 'stop' to be scanned in R. But I want to ignore one specific number '21' there. Putting differently, I want to get all numbers in the file except 21. Is there any command to achieve it? -- b-scan(F:\\stop.txt) -- Many thanks for your kind attention. Regards, Jamil. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Superscript and for loops
Hi, If I have data as follows: DATA_names-c( A mg kg B mg kg C mg kg D mg kg E mg kg F mg kg G mg kg H mg kg How do I convert to: -1 A (mg kg ) -1 B (mg kg ) -1 C (mg kg ) -1 D (mg kg ) -1 E (mg kg ) -1 F (mg kg ) -1 G (mg kg ) -1 H (mg kg ) I have lots of elements and I need to do this automatically in a for loop or the like? Thanks, Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scanning data in R
On 4/3/2013 4:33 AM, Naser Jamil wrote: Dear R-user, May I seek your suggestion. I have a data file 'stop' to be scanned in R. But I want to ignore one specific number '21' there. Putting differently, I want to get all numbers in the file except 21. Is there any command to achieve it? -- b-scan(F:\\stop.txt) -- Well, I don't know what is in stop.txt, but I will assume it is a string of numbers or strings separated by the end of line character and then a terminating EOL. Read it in as you have it and then: b - b[-21] If you are reading in a dataframe, simply use what you have and then do: b - b[-21, ] and you should have what you want, or perhaps you want b[21, ] - NA # indicate row 21 as a missing value Many thanks for your kind attention. Regards, Jamil. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Robert W. Baer, Ph.D. Professor of Physiology Kirksille College of Osteopathic Medicine A. T. Still University of Health Sciences Kirksville, MO 63501 USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript
Are you trying to convert a column name like Na_mgkg to a plot label like Na (mg kg^-1) ? If so you will have to use both string manipulation functions like gsub() and expression manipulating functions like bquote(). E.g., f - function (name) { # add other suffices and their corresponding plotmath expressions to the list env - list2env(list(mgkg = bquote(mg ~ kg^{-1}), ugkg = bquote(mu * g ~ kg^{-1})), parent = emptyenv()) pattern - paste0(_(, paste(objects(env), collapse=|), )) bquoteExpr - parse(text=gsub(pattern, ~(.(\\1)), name))[[1]] # I use do.call() to work around the fact that bquote's first argument is not evaluated. do.call(bquote, list(bquoteExpr, env)) } d - data.frame(Na_mgkg=1:10, K_ugkg=10:1) plot(Na_mgkg ~ K_ugkg, data=d, xlab=f(K_ugkg), ylab=f(Na_mgkg)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shane Carey Sent: Wednesday, April 03, 2013 8:02 AM To: r-help@r-project.org Subject: [R] Superscript Hi, How do I write a superscript within gsub? I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1]) Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Model Selection based on individual t-values with the largest possible number of variables in regression
To say that these strategies represent bad statistical practice is to put it mildly. Frank mister_O wrote Dear R-Community, When writing my master thesis, I faced with difficult issue. Analyzing the capital structure determinants I have one dependent variable (Total debt ratio = TD) and 15 independent ones. At the first stage I normalized my data by deleting outliers from each variable (Pairwise deletion) and in the result I got every variable to be with different length. Now when selecting relevant variables for the best model, neither stepwise nor forward or backward procedures don't work perfectly since there are a number of other combinations of variables wich have also high t-values. Thus, wichever model I pick, you never know whether this model is trustworthy. I tried to calculate all possible combinations of independent variables, but since I have 15 ones, there are thousands of such combinations and R simply refuses to calculate them! (computer crashes) I wonder if you can help me to write the code in R in order to find the model wich include as many variables as it possible with significant t-values? cheers, Oleg - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Model-Selection-based-on-individual-t-values-with-the-largest-possible-number-of-variables-in-regresn-tp4663174p4663202.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DUD (Does not Use Derivatives) for nonlinear
Date: Tue, 2 Apr 2013 06:59:13 -0500 From: Paul Johnson pauljoh...@gmail.com To: qi A send2...@gmail.com Cc: R-help r-help@r-project.org Subject: Re: [R] DUD (Does not Use Derivatives) for nonlinear regression in R? Message-ID: CAErODj_1pK8raHyAme_2Wt5zQZ_HqOhRjQ62bChhkORWbW=o...@mail.gmail.com Content-Type: text/plain On Apr 1, 2013 1:10 AM, qi A send2...@gmail.com wrote: Hi, All SAS has DUD (Does not Use Derivatives)/Secant Method for nonlinear regression, does R offer this option for nonlinear regression? I have read the helpfile for nls() and could not find such option, any suggestion? nelder-mead is default algorithm in optim. It does not use derivatives. dud is from same generation, but John Nash recommended N-M method. Pj Thanks, Derek [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] I'm not sure where Paul is saying I recommended N-M, but I do think it is important with optimization methods to recommend a method FOR some particular class of problems or for a problem solving situation. A blanket this is good recommendation cannot be made. I chose NM (slightly BEFORE DUD was released) as the only derivative free method in my 1979 book as it had the best balance of reliability and performance for an 8K machine (code and data) that I was using in 1975. It still works well as a first-try method for optimization, but generally is less efficient than gradient based methods, in particular because it does not have a good way to know it is finished. As a derivative-free method, it is not too bad, particularly in the nmk version in the dfoptim package. Indeed, I wish this version were put in optim() as the default, since it can deal with bounds constraints, though slightly less generally and less well than bobyqa or some other methods, and there are a couple of minor details it handles better than N-M in optim() that give it better performance and reliability. Readers should notice that there are lots of conditional statements above. It's a matter of selecting the right tool for the job. If you have lots of compute power and don't mind wasting it, NM will likely get somewhere near some or other optimum of your problem. It won't do it terribly fast, and you'd better make sure you didn't just run out of iterations or other measure that stops the program before it has decided it is done. Also that the answer is the one you want. Most optimization problems have more than one answer, and the wrong ones often seem to be easier to find. JN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scanning data in R
vec- scan(stop.txt) #Read 635 items vec1-vec[vec!=21] length(vec1) #[1] 584 any(vec1==21) #[1] FALSE A.K. - Original Message - From: Naser Jamil jamilnase...@gmail.com To: R help r-help@r-project.org Cc: Sent: Wednesday, April 3, 2013 5:33 AM Subject: [R] scanning data in R Dear R-user, May I seek your suggestion. I have a data file 'stop' to be scanned in R. But I want to ignore one specific number '21' there. Putting differently, I want to get all numbers in the file except 21. Is there any command to achieve it? -- b-scan(F:\\stop.txt) -- Many thanks for your kind attention. Regards, Jamil. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript and for loops
On Apr 3, 2013, at 9:06 AM, Shane Carey wrote: Hi, If I have data as follows: DATA_names-c( A mg kg B mg kg C mg kg D mg kg E mg kg F mg kg G mg kg H mg kg How do I convert to: -1 A (mg kg ) -1 B (mg kg ) -1 C (mg kg ) -1 D (mg kg ) -1 E (mg kg ) -1 F (mg kg ) -1 G (mg kg ) -1 H (mg kg ) I have lots of elements and I need to do this automatically in a for loop or the like? You haven't described the task in very much detail, so Bill Dunlap's language-oriented (more expressive as it were) solution my be what you really need. Nonetheless, this answer stays on the character-plane of R's conceptual levels: gsub(mg kg, (mg kg)^-1, DATA_names) [1] A (mg kg)^-1 B (mg kg)^-1 C (mg kg)^-1 D (mg kg)^-1 E (mg kg)^-1 F (mg kg)^-1 [7] G (mg kg)^-1 H (mg kg)^-1 If you wanted these in an expression vector suitable for labels on a barplot or such: sapply(gsub(mg kg, (mg kg)^-1, DATA_names), as.expression) expression(`A (mg kg)^-1` = A (mg kg)^-1, `B (mg kg)^-1` = B (mg kg)^-1, `C (mg kg)^-1` = C (mg kg)^-1, `D (mg kg)^-1` = D (mg kg)^-1, `E (mg kg)^-1` = E (mg kg)^-1, `F (mg kg)^-1` = F (mg kg)^-1, `G (mg kg)^-1` = G (mg kg)^-1, `H (mg kg)^-1` = H (mg kg)^-1) In practice: pos - barplot(1:length(DATA_names)) text(x=pos,y=-1, xpd=TRUE, srt=45, labels= sapply( gsub(mg kg, (mg kg)^-1, DATA_names), as.expression)) -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] arrayInd and which
On Apr 3, 2013, at 7:53 AM, Keith S Weintraub wrote: Folks, I have Googled but not found much regarding arrayInd aside from the which help page. Any good examples or docs on what arrayInd does that is better or different from which()? In addition take the following 20x10 matrix: td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L )) I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is my bounds are by column. Is there a better way to do this other than: bounds-c(rep(5,5), rep(4,5)) idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE) idxs row col [1,] 1 1 [2,] 13 1 [3,] 13 2 [4,] 1 3 [5,] 8 3 [6,] 13 3 [7,] 1 4 [8,] 13 4 [9,] 1 5 [10,] 13 5 [11,] 1 6 [12,] 4 6 [13,] 13 6 [14,] 4 7 [15,] 13 7 [16,] 1 8 [17,] 4 8 [18,] 13 8 [19,] 3 9 [20,] 1 10 [21,] 13 10 Lastly can you explain these results: td[idxs[10,]] [1] 4 6 td[idxs[10,1]] [1] 4 td[idxs[10,2]] [1] 6 td[idxs[10,3]] Error: subscript out of bounds This has nothing to do with the behavior of arrayInd and everything to do with the behavior of [. td[idxs[10,drop=FALSE] ] [1] 4 When extracting from a matrix with a result of asingle row the extracted object looses its matrix attributes and becomes a numeric vector. That behavior is prevented with drop=FALSE and desire results accrue. This would not have been a puzzle if you had chose multiple rows at a time: td [ idxs[1:2, ] ] [1] 1 4 td [ idxs ] [1] 1 4 1 1 3 3 1 5 3 4 2 1 1 5 3 2 2 4 1 2 3 3 -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data
A few minor improvements to Jean's post suggested inline below. On 2013-04-03 05:41, Adams, Jean wrote: Cecilia, Thanks for providing a reproducible example. Excellent. You could use the ddply() function in the plyr package to fit the model for each industry and year, keep the coefficients, and then estimate the fitted and residual values. Jean library(plyr) coef - ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z, data=dat)$coef) names(coef) - c(industry, year, b0, b1, b2) final4 - merge(final3, coef) newdata1 - transform(final4, Yhat = b0 + b1*X + b2*Z) newdata2 - transform(newdata1, residual = Y-Yhat) plot(as.factor(newdata2$firm), newdata2$residual) Suggestion 1: Use the extractor function coef() and also avoid using the name of an R function as a variable name: Coef - ddply(, function(dat) coef(lm())) Suggestion 2: Use plyr's mutate() to do both transforms at once: newdata - mutate(final4, Yhat = b0 + b1*X + b2*Z, residual = Y-Yhat) [Or you could use within(), but I now find mutate handier, mainly because it doesn't 'reverse' the order of the new variables.] Suggestion 3: Use the 'data=' argument in the plot: boxplot(residual ~ firm, data = newdata) Peter Ehlers On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: Hi R-helpers, My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal. #reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) X-rnorm(50) Y-rnorm(50) Z-rnorm(50) data1-data.frame(firm1,year1,industry1,X,Y,Z) data1 colnames(data1)-c(firm,year,industry,X,Y,Z) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) X-rnorm(15) Y-rnorm(15) Z-rnorm(15) data2-data.frame(firm2,year2,industry2,X,Y,Z) data2 colnames(data2)-c(firm,year,industry,X,Y,Z) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) X-rnorm(20) Y-rnorm(20) Z-rnorm(20) data3-data.frame(firm3,year3,industry3,X,Y,Z) data3 colnames(data3)-c(firm,year,industry,X,Y,Z) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$industry,final2$year),] final3 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, to obtain the estimates of b0, b1 and b2 by industry and year (for example I need to have de b0 for industry 20 and year 2000, for industry 20 and year 2001...). Then I need to calculate the fitted values and the residuals by firm so I need to keep b0, b1 and b2 in a way that I could do something like newdata1-transform(final3,Y'=b0+b1.X+b2.Z) newdata2-transform(newdata1,residual=Y-Y') or another way to keep Y' and the residuals in a dataframe with the columns firm and year. Until now I have been doing this in very hard way and because I need to do it several times, I need your help to get an easier way. Thank you, Cecília Carmo Universidade de Aveiro Portugal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] arrayInd and which
On Apr 3, 2013, at 10:59 AM, David Winsemius wrote: On Apr 3, 2013, at 7:53 AM, Keith S Weintraub wrote: Folks, I have Googled but not found much regarding arrayInd aside from the which help page. Any good examples or docs on what arrayInd does that is better or different from which()? In addition take the following 20x10 matrix: td-structure(c(1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 6, 1, 6, 6, 6, 6, 6, 6, 6, 6, 6, 2, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 3, 6, 6, 6, 6, 6, 6, 6), .Dim = c(20L, 10L )) I want to find the cells which (hah!) are = c(rep(5,5), rep(4,5)). That is my bounds are by column. Is there a better way to do this other than: bounds-c(rep(5,5), rep(4,5)) idxs-which(apply(td, 2, =, bounds), arr.ind = TRUE) idxs row col [1,] 1 1 [2,] 13 1 [3,] 13 2 [4,] 1 3 [5,] 8 3 [6,] 13 3 [7,] 1 4 [8,] 13 4 [9,] 1 5 [10,] 13 5 [11,] 1 6 [12,] 4 6 [13,] 13 6 [14,] 4 7 [15,] 13 7 [16,] 1 8 [17,] 4 8 [18,] 13 8 [19,] 3 9 [20,] 1 10 [21,] 13 10 Lastly can you explain these results: td[idxs[10,]] [1] 4 6 td[idxs[10,1]] [1] 4 td[idxs[10,2]] [1] 6 td[idxs[10,3]] Error: subscript out of bounds This has nothing to do with the behavior of arrayInd and everything to do with the behavior of [. td[idxs[10,drop=FALSE] ] [1] 4 Arrgh. The explanation was correct, but there is a missing comma. Should be: td[idxs[10, ,drop=FALSE] ] Only shows up if you chose a different row td [ idxs[12, drop=FALSE] ] [1] 6 WRONG td [ idxs[12, , drop=FALSE] ] [1] 1 Correct -- David. When extracting from a matrix with a result of asingle row the extracted object looses its matrix attributes and becomes a numeric vector. That behavior is prevented with drop=FALSE and desire results accrue. This would not have been a puzzle if you had chose multiple rows at a time: td [ idxs[1:2, ] ] [1] 1 4 td [ idxs ] [1] 1 4 1 1 3 3 1 5 3 4 2 1 1 5 3 2 2 4 1 2 3 3 David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scanning data in R
On Apr 3, 2013, at 2:33 AM, Naser Jamil wrote: Dear R-user, May I seek your suggestion. I have a data file 'stop' to be scanned in R. But I want to ignore one specific number '21' there. Putting differently, I want to get all numbers in the file except 21. Is there any command to achieve it? -- b-scan(F:\\stop.txt) b-scan(~/Downloads/stop.txt, na.strings=21) Read 635 items b [1] 15 15 16 15 17 16 16 15 17 15 16 17 19 17 19 15 16 16 19 18 19 17 17 19 16 17 16 20 15 16 [31] 16 15 18 19 17 18 16 18 16 17 NA NA 17 15 15 20 16 20 17 NA 19 15 15 15 18 20 16 17 16 15 snipped remainder -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript and for loops
Yup, I want these as labels on plots, but I need it as: D (mg kg^-1) rather than D (mg kg)^-1. Sorry for not being more clear and thanks for your help. Cheers On Wed, Apr 3, 2013 at 6:44 PM, David Winsemius dwinsem...@comcast.netwrote: On Apr 3, 2013, at 9:06 AM, Shane Carey wrote: Hi, If I have data as follows: DATA_names-c( A mg kg B mg kg C mg kg D mg kg E mg kg F mg kg G mg kg H mg kg How do I convert to: -1 A (mg kg ) -1 B (mg kg ) -1 C (mg kg ) -1 D (mg kg ) -1 E (mg kg ) -1 F (mg kg ) -1 G (mg kg ) -1 H (mg kg ) I have lots of elements and I need to do this automatically in a for loop or the like? You haven't described the task in very much detail, so Bill Dunlap's language-oriented (more expressive as it were) solution my be what you really need. Nonetheless, this answer stays on the character-plane of R's conceptual levels: gsub(mg kg, (mg kg)^-1, DATA_names) [1] A (mg kg)^-1 B (mg kg)^-1 C (mg kg)^-1 D (mg kg)^-1 E (mg kg)^-1 F (mg kg)^-1 [7] G (mg kg)^-1 H (mg kg)^-1 If you wanted these in an expression vector suitable for labels on a barplot or such: sapply(gsub(mg kg, (mg kg)^-1, DATA_names), as.expression) expression(`A (mg kg)^-1` = A (mg kg)^-1, `B (mg kg)^-1` = B (mg kg)^-1, `C (mg kg)^-1` = C (mg kg)^-1, `D (mg kg)^-1` = D (mg kg)^-1, `E (mg kg)^-1` = E (mg kg)^-1, `F (mg kg)^-1` = F (mg kg)^-1, `G (mg kg)^-1` = G (mg kg)^-1, `H (mg kg)^-1` = H (mg kg)^-1) In practice: pos - barplot(1:length(DATA_names)) text(x=pos,y=-1, xpd=TRUE, srt=45, labels= sapply( gsub(mg kg, (mg kg)^-1, DATA_names), as.expression)) -- David Winsemius Alameda, CA, USA -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscript
Hi William, This is exactly what Im trying to do. Your a star, Thanks On Wed, Apr 3, 2013 at 5:33 PM, William Dunlap wdun...@tibco.com wrote: Are you trying to convert a column name like Na_mgkg to a plot label like Na (mg kg^-1) ? If so you will have to use both string manipulation functions like gsub() and expression manipulating functions like bquote(). E.g., f - function (name) { # add other suffices and their corresponding plotmath expressions to the list env - list2env(list(mgkg = bquote(mg ~ kg^{-1}), ugkg = bquote(mu * g ~ kg^{-1})), parent = emptyenv()) pattern - paste0(_(, paste(objects(env), collapse=|), )) bquoteExpr - parse(text=gsub(pattern, ~(.(\\1)), name))[[1]] # I use do.call() to work around the fact that bquote's first argument is not evaluated. do.call(bquote, list(bquoteExpr, env)) } d - data.frame(Na_mgkg=1:10, K_ugkg=10:1) plot(Na_mgkg ~ K_ugkg, data=d, xlab=f(K_ugkg), ylab=f(Na_mgkg)) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Shane Carey Sent: Wednesday, April 03, 2013 8:02 AM To: r-help@r-project.org Subject: [R] Superscript Hi, How do I write a superscript within gsub? I have the following: gsub(_mgkg,expression(paste(mg kg^{-1})),names[1]) Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data
Peter. For suggestion 1, what advantages are there to using coef() rather than $coef? For suggestion 2, thanks! I'm new to the plyr package and wasn't aware of the mutate() function. Jean On Wed, Apr 3, 2013 at 1:01 PM, Peter Ehlers ehl...@ucalgary.ca wrote: A few minor improvements to Jean's post suggested inline below. On 2013-04-03 05:41, Adams, Jean wrote: Cecilia, Thanks for providing a reproducible example. Excellent. You could use the ddply() function in the plyr package to fit the model for each industry and year, keep the coefficients, and then estimate the fitted and residual values. Jean library(plyr) coef - ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z, data=dat)$coef) names(coef) - c(industry, year, b0, b1, b2) final4 - merge(final3, coef) newdata1 - transform(final4, Yhat = b0 + b1*X + b2*Z) newdata2 - transform(newdata1, residual = Y-Yhat) plot(as.factor(newdata2$firm), newdata2$residual) Suggestion 1: Use the extractor function coef() and also avoid using the name of an R function as a variable name: Coef - ddply(, function(dat) coef(lm())) Suggestion 2: Use plyr's mutate() to do both transforms at once: newdata - mutate(final4, Yhat = b0 + b1*X + b2*Z, residual = Y-Yhat) [Or you could use within(), but I now find mutate handier, mainly because it doesn't 'reverse' the order of the new variables.] Suggestion 3: Use the 'data=' argument in the plot: boxplot(residual ~ firm, data = newdata) Peter Ehlers On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo cecilia.ca...@ua.pt wrote: Hi R-helpers, My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal. #reproducible example firm1-sort(rep(1:10,5),**decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) X-rnorm(50) Y-rnorm(50) Z-rnorm(50) data1-data.frame(firm1,year1,**industry1,X,Y,Z) data1 colnames(data1)-c(firm,**year,industry,X,Y,Z) firm2-sort(rep(11:15,3),**decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) X-rnorm(15) Y-rnorm(15) Z-rnorm(15) data2-data.frame(firm2,year2,**industry2,X,Y,Z) data2 colnames(data2)-c(firm,**year,industry,X,Y,Z) firm3-sort(rep(16:20,4),**decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) X-rnorm(20) Y-rnorm(20) Z-rnorm(20) data3-data.frame(firm3,year3,**industry3,X,Y,Z) data3 colnames(data3)-c(firm,**year,industry,X,Y,Z) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$**industry,final2$year),] final3 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, to obtain the estimates of b0, b1 and b2 by industry and year (for example I need to have de b0 for industry 20 and year 2000, for industry 20 and year 2001...). Then I need to calculate the fitted values and the residuals by firm so I need to keep b0, b1 and b2 in a way that I could do something like newdata1-transform(final3,Y'=**b0+b1.X+b2.Z) newdata2-transform(newdata1,**residual=Y-Y') or another way to keep Y' and the residuals in a dataframe with the columns firm and year. Until now I have been doing this in very hard way and because I need to do it several times, I need your help to get an easier way. Thank you, Cecília Carmo Universidade de Aveiro Portugal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing y-axis in ggplot to show proportions instead of density
Hi Everyone, I have a frequency data, which I am displaying with an area-curve-like plot in ggplot2 using: ggplot(dfs, aes(x=values)) + geom_density(aes(group=ind)) The Y-axis that is returned is density, which is not really intuitive for my purposes and I would like to change it for proportions (i.e. the count of values at each bin divided by the total number of values). I have found that I can change the y-axis to display counts or scaled proportions but not just the proportions. something like +geom_density(aes(y = count/sum(count))) This code does not work but I wonder if something like this possible? Thanks, Camilo Camilo Mora, Ph.D. Department of Geography, University of Hawaii Currently available in Colombia Phone: Country code: 57 Provider code: 313 Phone 776 2282 From the USA or Canada you have to dial 011 57 313 776 2282 http://www.soc.hawaii.edu/mora/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] {Spam?} Re: linear model coefficients by year and industry, fitted values, residuals, panel data
On 2013-04-03 11:54, Adams, Jean wrote: Peter. For suggestion 1, what advantages are there to using coef() rather than $coef? Not much difference for lm models, but note that it does use the partial matching 'feature' of the '$' extractor, since the component name is actually 'coefficients'. But try it for an nls model. Mynlsmodel$coefficients won't work (well, it won't give an error but it will yield NULL). That's why there are special extractor functions such as coef.nls, coef.Arima, etc. For lm models, coef.default is used. Peter Ehlers For suggestion 2, thanks! I'm new to the plyr package and wasn't aware of the mutate() function. Jean [...snip...] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Make cdf package error Human Exon array
Hello everybody! I am not sure if this is the way for asking because I am new in this kind of help-website so, please if I am wrong tell me. I am trying to make a cdf package for the Human Exon Array chip from Affymetrix (HuEx-1_0-st-v2). I have downloaded the file HuEx-10-st-v2.cdf from the Affymetrix website and write the following commands in the R program: library(makecdfenv) pkgpath-tempdir() make.cdf.package(HuEx-1_0-st-v2.text.cdf,cdf.path=/Working/,compress=F, species = Homo_sapiens, package.path = pkgpath) It costs me a lot of time and most of the RAM memory of my computer and at the end the following message appears: Reading CDF file. Error in .Call(reaD file, as.character(file), as.integer(3), as.integer(compress), :promise already under evaluation: recursive default argument reference or earlier problems? I have no idea if I am doing anything wrong or if there is a better method for making the package, so any help will be really welcome. Thank you, Maria -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scanning data in R
Many thanks, Prof. David. It's exactly what I wanted! On 3 April 2013 17:26, David L Carlson dcarl...@tamu.edu wrote: You can certainly do it after scanning all the numbers in with b - scan(F:\\stop.txt, what=integer()) b - b[b!=21] -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Naser Jamil Sent: Wednesday, April 03, 2013 4:33 AM To: R help Subject: [R] scanning data in R Dear R-user, May I seek your suggestion. I have a data file 'stop' to be scanned in R. But I want to ignore one specific number '21' there. Putting differently, I want to get all numbers in the file except 21. Is there any command to achieve it? -- b-scan(F:\\stop.txt) -- Many thanks for your kind attention. Regards, Jamil. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Make cdf package error Human Exon array
On 04/03/2013 10:15 AM, Maria Arnedo Munoz wrote: Hello everybody! I am not sure if this is the way for asking because I am new in this kind of help-website so, please if I am wrong tell me. Hi Maria -- makecdfenv is a Bioconductor package, so ask on their mailing list http://bioconductor.org/help/mailing-list/ Best, Martin I am trying to make a cdf package for the Human Exon Array chip from Affymetrix (HuEx-1_0-st-v2). I have downloaded the file HuEx-10-st-v2.cdf from the Affymetrix website and write the following commands in the R program: library(makecdfenv) pkgpath-tempdir() make.cdf.package(HuEx-1_0-st-v2.text.cdf,cdf.path=/Working/,compress=F, species = Homo_sapiens, package.path = pkgpath) It costs me a lot of time and most of the RAM memory of my computer and at the end the following message appears: Reading CDF file. Error in .Call(reaD file, as.character(file), as.integer(3), as.integer(compress), :promise already under evaluation: recursive default argument reference or earlier problems? I have no idea if I am doing anything wrong or if there is a better method for making the package, so any help will be really welcome. Thank you, Maria -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with lmRob function
Hi, I am fairly new to R and have encountered an issue with the lmRob function that I have been unable to resolve. I am trying to run a robust regression using the lmRob function which runs successfully, but the results are rather strange. I'm not sure it's important, but my model has 3 dichotomous categorical variables and 2 continuous variables in it. When I look at a summary of my model, 1 of my continuous variables and 1 of my categorical variables both have an Estimate value of 1. My other 2 categorical variables and 1 other continuous variables all have Estimate values of 0. These results seem strange to me. When I run a simple rlm function the model runs fine and produces the results I'd expect to see. Another potentially useful piece of information is that when I run a model with just the two variables that have an Estimate value of 1 (one categorical and one continuous), I get the following error message: 1: In lmRob.fit.compute(x, y, x1.idx = x1.idx, nrep = nrep, robust.control = robust.control, : Sum(psi.weight(wi)) less than 1e-06 in lmRob.ucovcoef. during initial estimation. 2: In lmRob.fit.compute(x, y, x1.idx = x1.idx, nrep = nrep, robust.control = robust.control, : Sum(psi.weight(wi)) less than 1e-06 in lmRob.ucovcoef. during final scale estimation. I do not understand what this error message means and have been unable to resolve it. I tried several different permutations of my five variables in varying combinations and sometimes the models work and sometimes I get this error message. There seems to be no pattern as to when the error message occurs and when it does not. If anyone has encountered this or has any help I'd appreciate it. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with integrate function
Hello, The following code of mine is giving the error: Error in integrate(fx[[2]], 0.056, 1) : maximum number of subdivisions reached Can anyone help? Thanks and Regards. Swagato -- fv-vector(list) fx-vector(list) v-0 c-0 n-0 NOV-0 i-0 while(n200){ fv[[1]]-function(x)1 #prior function fx[[1]]-function(x){ rho-(0.2+6*x)/(.3+4*x) i-0 N-0 for(i in 1:length(x)){ kopt-function(K){(x[i]-0.05*(K-((1-rho[i]^K)/(1-rho[i])+(K*rho[i]^(K+1)-1))/(1-rho[i]^(K+1)))/(0.1+5*x[i]))^2} L-optim(c(0),kopt,method=BFGS) N[i]-round(L$par+1)} return(N) } # Calculate Kv+1 fx[[2]]-function(x)((0.2+6*x)/(.3+4*x))^n*(1-(0.2+6*x)/(.3+4*x))/(1-(0.2+6*x)/(.3+4*x))^fx[[1]](x)*fv[[1]](x) # numerator of f' gx-integrate(fx[[2]],0,0.045) #denomenator of f' hx-integrate(fx[[2]],0.056,1) fv[[2]]-function(x)x*fx[[2]](x)/(gx$value+hx$value) #v.f' fx[[3]]-function(x){ NV-0;n0-0; rho-(0.2+6*x)/(.3+4*x) i-0 N-0 for(i in 1:length(x)){ kopt-function(K){(x[i]-0.05*(K-((1-rho[i]^K)/(1-rho[i])+(K*rho[i]^(K+1)-1))/(1-rho[i]^(K+1)))/(0.1+5*x[i]))^2} L-optim(c(0),kopt,method=BFGS) N[i]-L$par+1 j-0 for(j in 0:N[i]) NV[i]- NV[i] + j*(rho[i])^j*(1-rho[i])/(1-rho[i]^(N[i]+1)) n0[i]-n} return(0.05*(n0-NV)/(0.1+5*x)) }#cost function fv[[3]]-function(x)fx[[3]](x)*fx[[2]](x)/(gx$value+hx$value) fv[[4]]-function(x)(n-fx[[3]](x)*(0.1+5*x)/0.05)*fx[[2]](x)/(gx$value+hx$value) a1-integrate(fv[[2]],0,0.045) a2-integrate(fv[[2]],0.056,1) v[n]-a1$value+a2$value #expected value b1-integrate(fv[[3]],0,0.045) b2-integrate(fv[[3]],0.056,1) c[n]-b1$value+b2$value #expected cost d1-integrate(fv[[4]],0,0.045) d2-integrate(fv[[4]],0.056,1) NOV[n]-d1$value+d2$value #expected length n-n+1 } plot(v) plot(c) plot(NOV) l-1/(1+exp(v-c)) plot(l) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] DUD (Does not Use Derivatives) for nonlinear
On 04/04/2013 05:34 AM, Prof J C Nash (U30A) wrote: SNIP Most optimization problems have more than one answer, and the wrong ones often seem to be easier to find. Fortune? cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model coefficients by year and industry, fitted values, residuals, panel data
On 04/04/2013 07:54 AM, Adams, Jean wrote: Peter. For suggestion 1, what advantages are there to using coef() rather than $coef? Just thought I'd chip in: It is considered, uh, politically correct to use extractor functions rather than digging out components of objects in a direct manner. The purported advantage of this is that it is robust against package changes which could restructure the object in question, which would break direct extraction but would (presumably) have no deleterious consequences for the use of extractor functions. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can package plyr also calculate the mode?
I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) So I got the mean, median std dev, quantiles etc. IS there any way I can add the mode to the mixt. Thanks ahead for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/Can-package-plyr-also-calculate-the-mode-tp4663235.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to re-shape a matrix
Hi All, I have a matrix like m - matrix( letters[1:10], ncol=5) How to conver it to 10 * 3 matrix, which the first col is row index of m, second col is colum index of m and third column is the value of matrix, say 11 1a 21 2 c 1 3 e etc... Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can package plyr also calculate the mode?
Sure, you can add the mode in, following the format by the other summary statistics. ?mode Sarah On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote: I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) So I got the mean, median std dev, quantiles etc. IS there any way I can add the mode to the mixt. Thanks ahead for any suggestions. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to re-shape a matrix
Thanks for the reproducible example. Because this mixes numeric and character data, it's better done as a data frame than a matrix. data.frame(rowind=as.vector(row(m)), colind=as.vector(col(m)), value = as.vector(m)) rowind colind value 1 1 1 a 2 2 1 b 3 1 2 c 4 2 2 d 5 1 3 e 6 2 3 f 7 1 4 g 8 2 4 h 9 1 5 i 10 2 5 j Sarah On Wed, Apr 3, 2013 at 5:28 PM, Hui Du hui...@dataventures.com wrote: Hi All, I have a matrix like m - matrix( letters[1:10], ncol=5) How to conver it to 10 * 3 matrix, which the first col is row index of m, second col is colum index of m and third column is the value of matrix, say 11 1a 21 2 c 1 3 e etc... Thanks. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can package plyr also calculate the mode?
If you type ?mode at an R prompt you will be able to read the help for the mode() function. On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: I tried mode=?mode(COUNTS) but that doesn't work. -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:32 PM To: Mossadegh, Ramine N. Cc: r-help Subject: Re: [R] Can package plyr also calculate the mode? Sure, you can add the mode in, following the format by the other summary statistics. ?mode Sarah On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote: I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) So I got the mean, median std dev, quantiles etc. IS there any way I can add the mode to the mixt. Thanks ahead for any suggestions. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] looking for help with clustering analysis
My data have good correlations with spearman method and bad correlations with pearson method. If I want to do cluster analysis to reflect the sprearman correlation, what method should I use to calculate the distance matrix? Thanks. Xin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] looking for help with clustering analysis
On Apr 3, 2013, at 2:38 PM, capricy gao wrote: My data have good correlations with spearman method and bad correlations with pearson method. If I want to do cluster analysis to reflect the sprearman correlation, what method should I use to calculate the distance matrix? Thanks. Difference in ranks? -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can package plyr also calculate the mode?
Of course it can. Use the mode() in the same way you used the mean() function. You didn't provide a reproducible example, so I can't provided tested code, but I would think that you can add mode=mode(COUNTS) to the ddply() arguments. all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) On Wed, Apr 3, 2013 at 5:36 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: So you mean it cannot be calculated within plyer? -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:36 PM To: Mossadegh, Ramine N.; r-help Subject: Re: [R] Can package plyr also calculate the mode? If you type ?mode at an R prompt you will be able to read the help for the mode() function. On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: I tried mode=?mode(COUNTS) but that doesn't work. -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:32 PM To: Mossadegh, Ramine N. Cc: r-help Subject: Re: [R] Can package plyr also calculate the mode? Sure, you can add the mode in, following the format by the other summary statistics. ?mode Sarah On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote: I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) So I got the mean, median std dev, quantiles etc. IS there any way I can add the mode to the mixt. Thanks ahead for any suggestions. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with integrate function
Error in integrate(fx[[2]], 0.056, 1) : maximum number of subdivisions reached Can anyone help? At the risk of longer integration time, look at the 'subdivisions' argument in ?integrate and consider increasing it? S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prop.test vs hand calculated confidence interval
Hi, This code: n=40 x=17 phat=x/n SE=sqrt(phat*(1-phat)/n) zstar=qnorm(0.995) E=zstar*SE phat+c(-E,E) Gives this result: [1] 0.2236668 0.6263332 The TI Graphing calculator gives the same result. Whereas this test: prop.test(x,n,conf.level=0.99,correct=FALSE) Give this result: 0.2489036 0.6224374 I'm wondering why there is a difference. D. -- View this message in context: http://r.789695.n4.nabble.com/prop-test-vs-hand-calculated-confidence-interval-tp4663245.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test vs hand calculated confidence interval
One of the joys of R is that it's open source: you can read the code for prop.test yourself and see what's happening. In this case, simply typing prop.test at the command line will provide it, although without comments. Sarah On Wed, Apr 3, 2013 at 6:10 PM, David Arnold dwarnol...@suddenlink.net wrote: Hi, This code: n=40 x=17 phat=x/n SE=sqrt(phat*(1-phat)/n) zstar=qnorm(0.995) E=zstar*SE phat+c(-E,E) Gives this result: [1] 0.2236668 0.6263332 The TI Graphing calculator gives the same result. Whereas this test: prop.test(x,n,conf.level=0.99,correct=FALSE) Give this result: 0.2489036 0.6224374 I'm wondering why there is a difference. D. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ask help
On 04/04/2013 01:31 AM, DONG Echo wrote: Hello! I am eager to learn if I only have a data about the relationship between otu and sample, could I use the function capscale, and final make a point plot that x-axis is CAP1 and y-axis is CAP2? Besides, what function could I use to make the different rarefaction curve with different color in the a plot in R? Appreciate very much. Sincerely! Hi Echo, This is such a great question that I am going to try to answer part of it. OTU tables are roughly observation by category tables, and as you mention the capscale (vegan) function, this will probably reduce to sampling unit by species. The CAPx scores can be plotted on 2D coordinates as in the first example from the capscale function. However, it does not seem to make any sense to overlay rarefaction curves on the CAP plot as the two are completely different measures. Displaying the two plots side by side would probably be a better idea. I tried to answer this for four reasons: 1) I didn't see another answer. 2) I thought it might be fun to find out what an OTU table was. 3) The question is an archetype of asking something highly specific without providing the necessary information for anyone but the questioner to understand it. 4) It struck me that the reverse often occurs, with answerers providing information that the questioner has little chance of understanding without a substantial amount of introduction. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a question about clustering
Hi all. Sorry to bother you. I have a table like the following A B C D E F G1 0 1 1 0 1 1 G2 0 1 1 0 1 1 G3 0 0 0 0 0 1 H1 1 1 1 1 1 1 H2 1 0 1 1 0 1 H3 1 0 1 1 0 1 I already know G1, G2 and G3 belong to the same group and H1, H2 and H3 belong to the other group. I want to cluster A, B, C, D, E and F. It seems that I can't input the group information of G1-H3 into the 'hclust' function. Are there any other cluster functions into which I could input the group information of G1-H3? Thanks very much! Any help will be greatly appreciated. Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test vs hand calculated confidence interval
It might help to look at the documentation. Typing ?prop.test on the command line. That would reveal various items of interest, including a section labeled Details. A close reading of that section turns up the explanation: The confidence interval is computed by inverting the score test. There are also journal references given. albyn On Wed, Apr 03, 2013 at 06:17:56PM -0400, Sarah Goslee wrote: One of the joys of R is that it's open source: you can read the code for prop.test yourself and see what's happening. In this case, simply typing prop.test at the command line will provide it, although without comments. Sarah On Wed, Apr 3, 2013 at 6:10 PM, David Arnold dwarnol...@suddenlink.net wrote: Hi, This code: n=40 x=17 phat=x/n SE=sqrt(phat*(1-phat)/n) zstar=qnorm(0.995) E=zstar*SE phat+c(-E,E) Gives this result: [1] 0.2236668 0.6263332 The TI Graphing calculator gives the same result. Whereas this test: prop.test(x,n,conf.level=0.99,correct=FALSE) Give this result: 0.2489036 0.6224374 I'm wondering why there is a difference. D. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Albyn Jones Reed College jo...@reed.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can package plyr also calculate the mode?
On 2013-04-03 14:59, Sarah Goslee wrote: Of course it can. Use the mode() in the same way you used the mean() function. You didn't provide a reproducible example, so I can't provided tested code, but I would think that you can add mode=mode(COUNTS) to the ddply() arguments. ?mode will direct you to the help page for _storage_ mode of an object. That's not likely what the OP had in mind. It seems that what s/he wants is the most frequent value. This is (usually) a pretty useless piece of information, but there are a number of packages that do provide it. To the OP: Install package sos and then do findFn(mode) to see what's available. E.g. packages, pracma, asbio, dprep, rattle and many others. Do note that they handle the multimodal situation differently. Or, write your own, perhaps using table() and which.max(). Peter Ehlers all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) On Wed, Apr 3, 2013 at 5:36 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: So you mean it cannot be calculated within plyer? -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:36 PM To: Mossadegh, Ramine N.; r-help Subject: Re: [R] Can package plyr also calculate the mode? If you type ?mode at an R prompt you will be able to read the help for the mode() function. On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: I tried mode=?mode(COUNTS) but that doesn't work. -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:32 PM To: Mossadegh, Ramine N. Cc: r-help Subject: Re: [R] Can package plyr also calculate the mode? Sure, you can add the mode in, following the format by the other summary statistics. ?mode Sarah On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote: I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) So I got the mean, median std dev, quantiles etc. IS there any way I can add the mode to the mixt. Thanks ahead for any suggestions. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating data frame from individual files
Dear Group: I have 72 files (.txt). Each file has 2 columns and column 1 is always identical for all 70 files. Each file has 90,799 rows and is standard across all files. I want to create a matrix 40(rows) x 70 columns. I tried : temp = list.files(pattern=*.txt) named.list - lapply(temp, read.delim) library(data.table) files.matrix -rbindlist(named.list) dim(files.matrix) [1] 6537456 2 What happened here is all 90K rows for 72 files were rbinded. I want to cbind. Could anyone please help me. Thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can package plyr also calculate the mode?
On 04/04/2013 09:56 AM, Peter Ehlers wrote: On 2013-04-03 14:59, Sarah Goslee wrote: Of course it can. Use the mode() in the same way you used the mean() function. You didn't provide a reproducible example, so I can't provided tested code, but I would think that you can add mode=mode(COUNTS) to the ddply() arguments. ?mode will direct you to the help page for _storage_ mode of an object. That's not likely what the OP had in mind. It seems that what s/he wants is the most frequent value. This is (usually) a pretty useless piece of information, but there are a number of packages that do provide it. To the OP: Install package sos and then do findFn(mode) to see what's available. E.g. packages, pracma, asbio, dprep, rattle and many others. Do note that they handle the multimodal situation differently. Or, write your own, perhaps using table() and which.max(). The OP might find the Mode (note capital M) function (prettyR) helpful. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ploting several functions on the same plot
I want to superimpose two functions plots in the same page. The functions L0 and L1, defined below f0 - function(mu, xm, ds, n) { 1 - pnorm((xm-mu)/(ds/sqrt(n))) } f1 - function(mu,n) f0(mu, 386.8, 48, n) L0 - function(mu) f1(mu, 36) plot(L0,ylim=c(0,1),xlim=c(360,420)) L1 - function(mu) f1(mu,100) lines(L1) The plot of L0 works pretty well. However, when trying to plot the second function, L1, the interpreter issues an error: Error in as.double(y) : cannot coerce type 'closure' to vector of type 'double' How can I solve my problem, since I want to have both plots with the same quality? I mean, I don't want to produce an approximating polygonal for the second function. Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ploting several functions on the same plot
On 13-04-03 7:35 PM, Julio Sergio wrote: I want to superimpose two functions plots in the same page. The functions L0 and L1, defined below f0 - function(mu, xm, ds, n) { 1 - pnorm((xm-mu)/(ds/sqrt(n))) } f1 - function(mu,n) f0(mu, 386.8, 48, n) L0 - function(mu) f1(mu, 36) plot(L0,ylim=c(0,1),xlim=c(360,420)) L1 - function(mu) f1(mu,100) lines(L1) The plot of L0 works pretty well. However, when trying to plot the second function, L1, the interpreter issues an error: Error in as.double(y) : cannot coerce type 'closure' to vector of type 'double' How can I solve my problem, since I want to have both plots with the same quality? I mean, I don't want to produce an approximating polygonal for the second function. curve(L1, add=TRUE) should handle it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can package plyr also calculate the mode?
My apologies: that's what happens when I don't start a clean session with no packages loaded. (Also yet another argument for reproducible examples: I always start a clean session to actualy create objects and run other people's code.) This might be of use (especially compared to my original answer!) http://stackoverflow.com/questions/2547402/standard-library-function-in-r-for-finding-the-mode Sarah On Wednesday, April 3, 2013, Peter Ehlers wrote: On 2013-04-03 14:59, Sarah Goslee wrote: Of course it can. Use the mode() in the same way you used the mean() function. You didn't provide a reproducible example, so I can't provided tested code, but I would think that you can add mode=mode(COUNTS) to the ddply() arguments. ?mode will direct you to the help page for _storage_ mode of an object. That's not likely what the OP had in mind. It seems that what s/he wants is the most frequent value. This is (usually) a pretty useless piece of information, but there are a number of packages that do provide it. To the OP: Install package sos and then do findFn(mode) to see what's available. E.g. packages, pracma, asbio, dprep, rattle and many others. Do note that they handle the multimodal situation differently. Or, write your own, perhaps using table() and which.max(). Peter Ehlers all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),**median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) On Wed, Apr 3, 2013 at 5:36 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: So you mean it cannot be calculated within plyer? -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:36 PM To: Mossadegh, Ramine N.; r-help Subject: Re: [R] Can package plyr also calculate the mode? If you type ?mode at an R prompt you will be able to read the help for the mode() function. On Wed, Apr 3, 2013 at 5:34 PM, Mossadegh, Ramine N. ramine.mossad...@finra.org wrote: I tried mode=?mode(COUNTS) but that doesn't work. -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Wednesday, April 03, 2013 5:32 PM To: Mossadegh, Ramine N. Cc: r-help Subject: Re: [R] Can package plyr also calculate the mode? Sure, you can add the mode in, following the format by the other summary statistics. ?mode Sarah On Wed, Apr 3, 2013 at 5:25 PM, ramoss ramine.mossad...@finra.org wrote: I am trying to replicate the SAS proc univariate in R. I got most of the stats I needed for a by grouping in a data frame using: all1 - ddply(all,ACT_NAME, summarise, mean=mean(COUNTS), sd=sd(COUNTS), q25=quantile(COUNTS,.25),**median=quantile(COUNTS,.50), q75=quantile(COUNTS,.75), q90=quantile(COUNTS,.90), q95=quantile(COUNTS,.95), q99=quantile(COUNTS,.99) ) So I got the mean, median std dev, quantiles etc. IS there any way I can add the mode to the mixt. Thanks ahead for any suggestions. -- Sarah Goslee http://www.**functionaldiversity.org http://www.functionaldiversity.org __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sarah Goslee http://www.stringpage.com http://www.sarahgoslee.com http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating data frame from individual files
Hi, suppose, you have 3 files with 2 columns: named.list- list(structure(list(col1 = 1:6, col2 = c(0.5, 0.2, 0.3, 0.3, 0.1, 0.2)), .Names = c(col1, col2), class = data.frame, row.names = c(NA, -6L)), structure(list(col1 = 1:6, col2 = c(0.9, 0.7, 0.5, 0.2, 0.5, 0.2)), .Names = c(col1, col2), class = data.frame, row.names = c(NA, -6L)), structure(list(col1 = 7:12, col2 = c(0.1, 0.5, 0.9, 0.3, 0.6, 0.4)), .Names = c(col1, col2), class = data.frame, row.names = c(NA, -6L))) named.list1-do.call(cbind,named.list) mat1-as.matrix(named.list1[!duplicated(as.list(named.list1))]) dimnames(mat1)-NULL mat1 # [,1] [,2] [,3] [,4] [,5] #[1,] 1 0.5 0.9 7 0.1 #[2,] 2 0.2 0.7 8 0.5 #[3,] 3 0.3 0.5 9 0.9 #[4,] 4 0.3 0.2 10 0.3 #[5,] 5 0.1 0.5 11 0.6 #[6,] 6 0.2 0.2 12 0.4 Because you mentioned 72 files and you need 70 columns in the result matrix, I think you need only the first column. In that case: named.list2-do.call(cbind,lapply(named.list,`[`,1)) mat2- as.matrix(named.list2[!duplicated(as.list(named.list2))]) dimnames(mat2)-NULL mat2 # [,1] [,2] #[1,] 1 7 #[2,] 2 8 #[3,] 3 9 #[4,] 4 10 #[5,] 5 11 #[6,] 6 12 I am not sure this is what you wanted. A.K. - Original Message - From: Adrian Johnson oriolebaltim...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Wednesday, April 3, 2013 7:05 PM Subject: [R] Creating data frame from individual files Dear Group: I have 72 files (.txt). Each file has 2 columns and column 1 is always identical for all 70 files. Each file has 90,799 rows and is standard across all files. I want to create a matrix 40(rows) x 70 columns. I tried : temp = list.files(pattern=*.txt) named.list - lapply(temp, read.delim) library(data.table) files.matrix -rbindlist(named.list) dim(files.matrix) [1] 6537456 2 What happened here is all 90K rows for 72 files were rbinded. I want to cbind. Could anyone please help me. Thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Select single probe-set with median expression from multiple probe-sets corresponding to same gene -AFFY
Hello All, I need your help. I am analysing affymetrix data and have to select the probe-set that has median expression among all the probe-sets for same gene. This way I want to remove the redundancy by keeping the analysis to single gene entry level. I am fully aware that it is not a nice thing to do but I just have to do it. To do so, I came across 'findLargest' function of 'genefilter' package but it's not well documented; and I do not know how to implement the 'findLargest' function. At this point I have: esetRMA - rma(mydata) Could anybody guide me on how can I select single probeset with median expression from multiple probe-sets corresponding to single gene and discard others? Is there any other way to achieve so i.e. other than using 'genefilter'? Genefilter package: http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html Thanks AK -- Atul Kakrana Delaware Technology Park __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem quitting
Can someone explain why all of a sudden I can't quit? q(n) returns Error in gzfile(file, wb) : cannot open the connection In addition: Warning message: In gzfile(file, wb) : cannot open compressed file '.RDataTmp', probable reason 'Permission denied' I'm running R 2.15.3 GUI 1.53 Leopard build 64-bit on a MacBook Pro Processor 2.3 GHz Intel Core i7 Software Mac OS X Lion 10.7.5 (11G63) Julian Wells __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Deviance in Zero inflated models
Dear list, I am running some zero inflated models and would like to know what the deviance of the models. Unlike running a normal GLM where the deviance is displayed in the summary all that is displayed in a summary of the zero inflated model is the log likelihood. I hope this isn't a read the manual question, and if it is I apologize for wasting your time, but if you could still send me a link of where I might find this information I would be very grateful! Thank you Lia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ploting several functions on the same plot
Duncan Murdoch murdoch.duncan at gmail.com writes: curve(L1, add=TRUE) should handle it. Thanks Duncan, Your solution worked great! However, I'm puzzled for a problem in the same line: When I passed a function producer to plot, again it works well; however it doesn't work in the same way when I try to use this same argument in curve. See what I'm referring to: # A particular normal distribution function: dn8 - function(z,m) dnorm(z,m,8) # norm dist with sd=8 # A function that produces functions: # returns a function such that given a mean(i) depends only # on z fi - function(i) function(z) dn8(z,i) plot(fi(370),xlim=c(360,420)) # Works well! curve(fi(380),add=T) # This doesn't work # But in this way, it works! ff - fi(380) curve(ff,add=T) I cannot understand this behaviour. Could anyone give me some feedback on this? Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Deviance in Zero inflated models
Lia McKinnon l.mckinnon101 at gmail.com writes: I am running some zero inflated models and would like to know what the deviance of the models. Unlike running a normal GLM where the deviance is displayed in the summary all that is displayed in a summary of the zero inflated model is the log likelihood. I hope this isn't a read the manual question, and if it is I apologize for wasting your time, but if you could still send me a link of where I might find this information I would be very grateful! are you looking for -2*logLik(model) ... ? This is the usual definition of the deviance (there are sometimes some subtle issues about the baseline model/additive constant). It would also be helpful to say what package you're using (pscl?), since zero-inflated models are not part of base R ... Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select single probe-set with median expression from multiple probe-sets corresponding to same gene -AFFY
On 04/03/2013 03:17 PM, Atul Kakrana wrote: Hello All, I need your help. I am analysing affymetrix data and have to select the probe-set that has median expression among all the probe-sets for same gene. This way I want to remove the redundancy by keeping the analysis to single gene entry level. I am fully aware that it is not a nice thing to do but I just have to do it. To do so, I came across 'findLargest' function of 'genefilter' package but it's not well documented; and I do not know how to implement the 'findLargest' function. At this point I have: esetRMA - rma(mydata) Could anybody guide me on how can I select single probeset with median expression from multiple probe-sets corresponding to single gene and discard others? Is there any other way to achieve so i.e. other than using 'genefilter'? Genefilter package: http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html Hi Atul --It's a Bioconductor package, so might as well ask instead on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ As a reproducible example, load the ALL sample ExpressionSet, Biobase and genefilter packates library(Biobase) library(ALL) library(genefilter) The three arguments to findLargest are the names of the probe sets featureNames(ALL) the test statistic rowMedians(ALL) and the chip from which the ExpressionSet is based annotation(ALL) So the variable idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL) identifies the probes and ALL1 = ALL[idx,] gets you the data you're interested in. Again, follow-up questions should go to the Bioconductor mailing list. Martin Thanks AK -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Select single probe-set with median expression from multiple probe-sets corresponding to same gene -AFFY
On 04/03/2013 08:34 PM, Martin Morgan wrote: On 04/03/2013 03:17 PM, Atul Kakrana wrote: Hello All, I need your help. I am analysing affymetrix data and have to select the probe-set that has median expression among all the probe-sets for same gene. This way I want to remove the redundancy by keeping the analysis to single gene entry level. I am fully aware that it is not a nice thing to do but I just have to do it. To do so, I came across 'findLargest' function of 'genefilter' package but it's not well documented; and I do not know how to implement the 'findLargest' function. At this point I have: esetRMA - rma(mydata) Could anybody guide me on how can I select single probeset with median expression from multiple probe-sets corresponding to single gene and discard others? Is there any other way to achieve so i.e. other than using 'genefilter'? Genefilter package: http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html Hi Atul --It's a Bioconductor package, so might as well ask instead on the Bioconductor mailing list http://bioconductor.org/help/mailing-list/ As a reproducible example, load the ALL sample ExpressionSet, Biobase and genefilter packates library(Biobase) library(ALL) library(genefilter) The three arguments to findLargest are the names of the probe sets featureNames(ALL) the test statistic rowMedians(ALL) and the chip from which the ExpressionSet is based annotation(ALL) So the variable idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL) identifies the probes and ALL1 = ALL[idx,] gets you the data you're interested in. Again, follow-up questions should go to the Bioconductor mailing list. oops, a little quick off the draw, there. That gives the probe set with the largest median expression across samples; I'm not really sure what you're after -- the 'closest-to-median' probe set when averaging expression across samples? At any rate you'll get a more considered response on the Bioc mailing list, sorry for being misleading. Martin Martin Thanks AK -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] categorized complete list of R commands?
every time I read the R release notes for the next release, I see many functions that I had forgotten about and many functions that I never knew existed to begin with. (who knew there were bibtex facilities in R? obviously, everyone except me.) I wonder whether there is a complete list of all R commands (incl the standard packages) somewhere, preferably each with its one-liner AND categorization(s). the one-liner can be generated from the documentation. I am thinking one categorization for function area (e.g., programming related for, say, deparse; and statistical model related for lm; and another categorization for importance (e.g., like common for lm and obscure for ..). Such categorizations require intelligence. if I am going to do this for myself, I think a csv spreadsheet may be a good idea to make it easy to resort by keys. regards, /iaw Ivo Welch (ivo.we...@gmail.com) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] node stack overflow when extracting labels from a dendrogram
It looks like R 3.0.0 has the same limitation as the 2.x series. When extracting labels from about 3 node dendrogram (x=labels(h)) R throws an error: Error in match.fun(FUN) : node stack overflow Is there a way around it? Thanks, Alex __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MAS (non-parametric semi-parametric) methods for survey longitudinal data
Is there a package that provides equivalents of MASS package, especially non-parametric and semi-parametric methods for complex survey and longitudinal data? Is there a book that someone recommend that covers these topics with R (or Stata) examples? My web-searches have not resulted in much except that these seem to be actively researched topics by some statisticans working on theory --- no packages mentions of software were found in the few (I recall only one) empirical examples found. Anupam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] making scatter plot points fill semi-transparent
Hey hey, I know that this thread is old, but no one seemed to provide the answer. Yes, there is absolutely a way to bring in transparency while still using normal colornames. Just use this code when specifying your color: col=adjustcolor(colorname, alpha=0.5) Alpha values can range from 0 to 1 Cheers, www.kikatarsi.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with kriging interpolation
All, I am new to using R and know some basics. I wish to use kriging in R to do the following: given data Y =f(X1,X2,X3,.,Xn) --1000+ irregular measured data set. I would like to be able to get a single value y given sinle input set (x1,x2,x3,...xn) A google search on this takes me lierally to the same example on involving analysis with soil sampling and I cannot figure out how to extract single point interpolant. Any examples or pointers appreciated, Numeris. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.