Re: [R] lattice: strip panel function question
On Mon, Dec 6, 2010 at 6:22 PM, Maarten van Iterson m.van_iterson...@lumc.nl wrote: Thanks Chris Campbell, I didn't though about that. Cheers, Maarten On Mon, 2010-12-06 at 10:08 +, Chris Campbell wrote: data$subjectID - paste(data$groups, data$subjects) # create a character label xyplot(responses~time|subjectID, groups = groups, data = data, aspect=xy) Another option is xyplot(responses~time | groups:subjects, data = data, aspect=xy) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t-test or ANOVA...who wins? Help please!
Hello Frodo, It is not clear to me from your questions some of the basics of your analysis. If you only have two levels of a factor, and one response - why in the anova do you use more factors (and their interactions)? In that sense, it is obvious that your results would differ from the t-test. In either case, I am not sure if any of these methods are valid since your data doesn't seem to be normal. Here is an example code of how to get the same results from aov and t.test. And also a nonparametric option (that might be more fitting) flat_550_W_realism =c(3,3,5,3,3,3,3,5,3,3,5,7,5,2,3) flat_550_W_realism_AH =c(7,4,5,3,6,5,3,5,5,7,2,7,5, 5) x - c(rep(1, length(flat_550_W_realism)), rep(2, length(flat_550_W_realism_AH))) y - c(flat_550_W_realism , flat_550_W_realism_AH) # equal results between t test and anova t.test(y ~ x, var.equal= T) summary(aov(y ~ x)) # plotting the data: boxplot(y ~ x) # group 1 is not at all symetrical... wilcox.test(y ~ x) # a more fitting test Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Wed, Jan 5, 2011 at 12:37 AM, Frodo Jedi frodo.j...@yahoo.com wrote: I kindly ask you an help because I really don´t know how to solve this problem. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cost-benefit/value for money analysis
Ben Perhaps you can specify your question more precisely, or differently. The way I interpret it, if there are no interactions in price (e.g. you get a discount for buying more than one book at a time) or in value (e.g. you learn more from one book having read another), then you get the best value/price ratio by taking only the book with the highest value/price. (If you take no books at all, your value/price ratio is undefined.) The algebra below shows that combining a lower value/price book with a higher one always lowers your overall value/price ratio. Thanks, for the pointers on R functions. My question was as superficial as it sounded. I have a commercial programme that does this (one of several that are available), and wondered if there was an R package that provided the same tools. It's a common tool, and I had hoped to have explained enough to allow an appropriate package to be identified, so I could have a quick look at what it does. But having started this, I now feel obliged to clarify the question. I only chose books as an easy example, you could substitute alternative marketing strategies, monitoring programmes, choice of ornaments for a new house, or holidays etc. So there could be only a few potential combinations or hundreds. But to stick with the books, and only three options: A, B and C Book A costs $100 and I have given it a subjective value of 50 Book B costs $36 and I have given it a subjective value of 60 Book C costs $50 and I have given it a subjective value of 80 So book A is costing me $2 per value unit, Book B $0.6 per value unit and book C £0.63 per value unit. Buying books A+B gives me a $1.24 per value unit Buying books A+C gives $1.07 per value unit Buying books B+C gives 0.61 per value unit Buying books A+B+C gives 0.97 per value unit So in terms of value for money, there are three contenders Book B on its own, Book C on its own, or buying both books B and C. Book B $36.00 and value 60 Book C $50.00 and value 80 Book B+C at $76.00 and value 140 Depending on how you are using this tool, you can either use it to decide how spend an existing budget, or use it to set a budget. Seems hardly worth the bother for three books but if you are looking at 20 books or 30 different monitoring options etc, it gives a useful insight into how best to spend or set a budget The commercial software graphs this costs vs values so you usually end up with some sort of an asymptotic graph where you can see that spending below a certain budget gives a very poor return. Graham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] openNLP package error
Apologies that I am late on this thread. On 02/12/10 17:39, Sascha Wolfer wrote: I seem to have a problem with the openNLP package, I'm actually stuck in the very beginning. Here's what I did: install.packages(openNLP) install.packages(openNLPmodels.de, repos = http://datacube.wu.ac.at/;, type = source) library(openNLPmodels.de) library(openNLP) So I installed the main package as well as the supplementary german model. Now, I try to use the sentDetect function: s - c(Das hier ist ein Satz. Und hier ist noch einer - sogar mit Gedankenstrich. Ist das nicht toll?) sentDetect(s, language = de, model = openNLPmodels.de) I get the following error message which I can't make any sense of: Fehler in .jnew(opennlp/maxent/io/SuffixSensitiveGISModelReader, .jnew(java.io.File, : java.io.FileNotFoundException: openNLPmodels.de (No such file or directory) The correct syntax seems to be sentDetect(s, model = system.file(models, de-sent.bin, package = openNLPmodels.de)) but unfortunately I get Error in .jcall(.jnew(opennlp/maxent/io/SuffixSensitiveGISModelReader, : java.io.UTFDataFormatException: malformed input around byte 48 YMMV. But you get the idea on the syntax of the model= argument. This works: sentDetect(s, model = system.file(models, sentdetect, EnglishSD.bin.gz, package = openNLPmodels.en)) # [1] Das hier ist ein Satz. # [2] Und hier ist noch einer - sogar mit Gedankenstrich. # [3] Ist das nicht toll? Hope this helps you a little. Allan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bwplot
I'm trying use the function bwplot, but I receive a message that the function is not found. I charged the lattice, sm, and Hmrsc, package but without success. That I trying to do is an unique box-plot with in the x-axes two levels Season and Area, and in the y axis abundance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R(D) Com under R1070
I get the same trouble Please finally did you succeed fixing this trouble ? Henri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Navigating web pages using R
You are talking about dig the data from a dynamic webpage. The data displayed to you, I guess, is fetched via filtering from certain database. And the dropdowns you saw in the page must be some sort of widgets to do these filtering. Some sites offer this filtering via URL parsing, where the final URL changes along with used filters. But other sites might only offer those data via embeded widgets and does no change to the URL you see. Maybe your case beongs to the second type. Maybe you can do some analysis to the source of that webpage. If you are lucky, you can find some codes dealing with the filering job. They might offer some help. :D 2011/1/5 Mike Marchywka marchy...@hotmail.com Date: Tue, 4 Jan 2011 10:54:19 -0800 From: egregory2...@yahoo.com To: r-help@r-project.org Subject: [R] Navigating web pages using R R-Help, I'm trying to obtain some data from a webpage which masks the URL from the user, so an explicit URL will not work. For example, when one navigates to the web page the URL looks something like: http://137.113.141.205/rpt34s.php?flags=1 (changed for privacy, but i'm not sure you could access it anyways since it's internal to the agency I work for). LOL, presuming you are not a disgruntled employee, it is always amusing to see some entity with a fancy cryptic web design drink their own Koolaid :) This is the most annoying kind of code to write, especially when there is no reason such as revenue model to make it hard to get. I've posted in other forums about the general need for an API if you are providing data to others in a non-hostile setting. The site has three drop-down menus for Site, Month, and Year. When a combination is selected of these, the resulting URL is always http://137.113.141.205/rpt34s (nothing changes, except flags=1 is dropped, so what I need to be able to do is write something that will navigate to the original URL, then select some combination of Site, Month, and Year, and then submit the query to the site to navigate to the page with the data. Is this a capability that R has as a language? Unfortunately, I'm unfamiliar with html or php programming, so if this question belongs in a forum on that I apologize. I'm trying to centralize all of my code for my analysis in R! I'm sure that ultimately you can code this in R but for digging out what you need there may be better approaches. First I would try to contact the page author or determine if there is a better way to get the same data. Failing that, you may be able to find a form section in the html and copy that. Firefox is supposed to have something called firebug to let you see what the page does but I've never actually used that. Generally I use linux or cygwin command line tools to diagnose this junk, R may support some of these features but this is a common issue outside of R too and so it may be worth while learning the other tools. If all else fails, downloading a local copy of the page etc, you may be able to do a packet capture and just see what it does by brute force. From what I have seen, the R tools are pretty much named after the linux tools, curl for example. Thank you, -Erik Gregory Student Assistant, California EPA CSU Sacramento, Mathematics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice: strip panel function question
Thanks, Deepayan, that solution is even more elegant! Maarten On Wed, 2011-01-05 at 14:24 +0530, Deepayan Sarkar wrote: On Mon, Dec 6, 2010 at 6:22 PM, Maarten van Iterson m.van_iterson...@lumc.nl wrote: Thanks Chris Campbell, I didn't though about that. Cheers, Maarten On Mon, 2010-12-06 at 10:08 +, Chris Campbell wrote: data$subjectID - paste(data$groups, data$subjects) # create a character label xyplot(responses~time|subjectID, groups = groups, data = data, aspect=xy) Another option is xyplot(responses~time | groups:subjects, data = data, aspect=xy) -Deepayan -- Maarten van Iterson Center for Human and Clinical Genetics Leiden University Medical Center (LUMC) Research Building, Einthovenweg 20 Room S-04-038 Phone: 071-526 9439 E-mail: m.van_iterson...@lumc.nl --- Postal address: Postzone S-04-P Postbus 9600 2300 RC Leiden The Netherlands __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting Fortran or C++ etc to R
I did a quick search for interfacing R and Fortran. Found this past information. Hope it helps. :D http://r.789695.n4.nabble.com/Conerned-about-Interfacing-R-with-Fortran-td887428.html As for your actual requirement to do the convertion, I guess there'd not exist any quick ways. You have to be both familiar with R and the other language to make the rewrite work. 2011/1/5 Murray Jorgensen m...@stats.waikato.ac.nz I'm going to try my hand at converting some Fortran programs to R. Does anyone know of any good articles giving hints at such tasks? I will post a selective summary of my gleanings. Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nzFax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441 Mobile 021 0200 8350 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use S-Plus functions in R
Hi I am very new to R. I used to work in S-Plus a lot but that was years ago. I wrote a large number of functions that I now want to view and edit in R. I know I have to tell R where the functions are but I have no idea how. The functions are stored on my laptop's c-drive. I tried everything I could find e.g. library(myfilepath), source(myfilepath) etc. but nothing seems to work. Hein -- View this message in context: http://r.789695.n4.nabble.com/How-to-use-S-Plus-functions-in-R-tp3174963p3174963.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What are the necessary Oracle software to install and run ROracle ?
Hello, I am running Linux, I have downloaded instantclient-basiclite-linux32-11.2.0.2.0.zip instantclient-sqlplus-linux32-11.2.0.2.0.zip instantclient-sdk-linux32-11.2.0.2.0.zip instantclient-precomp-linux32-11.2.0.2.0.zip All these tarballs are unzipped in /usr/local/lib/instantclient, I have added this path in the library path of the host. I can run sqlplus and proc, they do not complain about missing symbol. Then I install ROracle : install.packages(ROracle) Compilation step is OK But when the test step tries to load the ROracle.so library, it fails : ** testing if installed package can be loaded Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library '/opt/R-2.11.1/lib/R/library/ROracle/libs/ROracle.so': /opt/R-2.11.1/lib/R/library/ROracle/libs/ROracle.so: undefined symbol: sqlprc Here is my list of lib in instantclient directory : $ find -name *.*o -o -name *.a ./libsqlplusic.so ./sdk/demo/procobdemo.pco ./cobsqlintf.o ./libociicus.so ./libnnz11.so ./libocijdbc11.so ./libsqlplus.so Do I need so more lib ? From which Oracle tarball ? Thanks for help This message and any attachments (the message) is\ int...{{dropped:31}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-downunder] Converting Fortran or C++ etc to R
Hi Murray, at first I thought you meant compiling existing Fortran or C++ for use in R with .Fortran() and so on, but do you mean literal conversion from Fortran to just pure R code? I'm assuming pure R code for the rest of this: I've tried with some fairly simple C++ and C code, and that's been fairly easy - there are a lot of details you can ignore and just try to figure out the algorithm. It's nice if you have running software so you can compare outputs, but I did once eventually figure out some Pascal code from an old text book - it had enough actual example data printed in the book that allowed me eventually to figure it out. There were people around me who had once compiled Pascal, but it didn't sound like it was going to be much fun. Sometimes C and C++ chunks can be copied over directly and used with very few changes, but it will just depend. Good luck, and I would just jump in the deep end and send in questions if you get stuck. Cheers, Mike. On Wed, Jan 5, 2011 at 11:02 AM, Murray Jorgensen m...@stats.waikato.ac.nz wrote: I'm going to try my hand at converting some Fortran programs to R. Does anyone know of any good articles giving hints at such tasks? I will post a selective summary of my gleanings. Cheers, Murray -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350 -- r-downun...@stat.auckland.ac.nz http://www.stat.auckland.ac.nz/r-downunder To unsubscribe send an email to r-downunder-unsubscr...@stat.auckland.ac.nz -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: bwplot
Hi r-help-boun...@r-project.org napsal dne 05.01.2011 09:20:35: I'm trying use the function bwplot, but I receive a message that the function is not found. I charged the lattice, sm, and Hmrsc, package but Can you please explain how one can **charge** packages? I never did it. Besides did you start with library(lattice) before trying to issue bwplot(anything...) Regards Petr without success. That I trying to do is an unique box-plot with in the x-axes two levels Season and Area, and in the y axis abundance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding lines in ggplot2
Hi Bert: On Tue, Jan 4, 2011 at 8:39 PM, Bert Gunter gunter.ber...@gene.com wrote: Dennis: Can't speak to ggplot2, but your comments regarding lattice are not quite correct. Many if not all of lattice's basic plot functions are generic, which means that one has essentially complete latitude to define plotting methods for arbitrary data structures. For example, there is an xyplot.ts method for time series -- class ts -- data. Of course, for most lattice methods, the data do naturally come in a data frame, and a standard lattice argument is to give a frame from which to pull the data. But this is not required. I'm aware of that, but thank you for clarifying matters. I didn't state explicitly whether lattice required data frame input or not (my lattice example indicated no and indeed it does not), but the message was evidently muddled further down the post. Your comments speak to some of the differences in the design and philosophy of lattice and ggplot2, and I have no disagreement with your remarks about lattice. The point I was trying to make was that by using data frames and the several packages/base functions that support their manipulation, one can simplify the coding of graphics within both ggplot2 and lattice. There are many things one can do with data frames that one cannot with vectors, as you well know - e.g., extensions with new data (rbind) or new variables (cbind/transform, etc.), or reshaping, among others. These features can be used to advantage in both ggplot2 and lattice. The OP's example is a simple one - had he used df - data.frame(x = sqrt(1:10), y = log(1:10)) # oops, forgot 1:10... qplot(as.numeric(rownames(df)), x, data = df, geom = 'line', colour = I('darkgreen')) # ...but it's OK # or xyplot(x ~ as.numeric(rownames(x)), data = df, type = 'l', col.line = 'darkgreen') there would have been no problem. A little inconvenient for a new user, maybe, but hardly 'very restrictive'. As for other types of R data objects that are not data frames, offhand I can't think of too many that are incapable of being converted to data frames somehow for the purposes of graphics, although I wouldn't be remotely surprised if some existed. [For example, one can extract fitted values, residuals and perhaps a model matrix from a model object and place the results in a data frame.] ggplot2 has a fortify() method to allow one to transform data objects for use in the package. There is some discussion in Chapter 9 of Hadley's book, but I'm not in a position to add insight as I haven't used it personally. I do think this is a fair statement, though, and it's been said before: if one wants *complete* control and flexibility of inputs and outputs, use base graphics. Both lattice and ggplot2, by virtue of being structured graphics systems, impose certain constraints (e.g., default actions) on the user which are system-dependent. Prof. Vardeman's quote still applies :) Dennis -- Bert Please explain to me how df - data.frame(x, y, index = 1:10) qplot(index, x, geom = 'line', ...) is 'very restrictive'. Lattice and ggplot2 are *structured* graphics systems - to get the gains that they provide, there are some costs. I don't perceive organization of data into a data frame as being restrictive - in fact, if you learn how to construct data for input into ggplot2 to simplify the code for labeling variables and legends, the data frame requirement is actually a benefit rather than a restriction. Moreover, one can use the plyr and reshape(2) packages to reshape or condense data frames to provide even more flexibility and freedom to produce ggplot2 and lattice graphics. In addition, the documentation for ggplot2 is quite explicit about requiring data frames for input, so it is behaving as documented. The complexity (and interaction) of the graphics code probably has something to do with that. Since Josh left you a quote, I'll supply another, from Prof. Steve Vardeman in a class I took with him a long time ago: There is no free lunch in statistics: in order to get something, you've got to give something up. In this case, if you want the nice infrastructure provided by ggplot2, you have to create a data frame for input. Dennis Thanks in advance, and best regards! Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] how to subset unique factor combinations from a data frame.
Hi You probably did not notice xtabs I mentioned before. as.data.frame(xtabs(~x+xx)) u - as.data.frame(table(x, xx)) head(u) x xx Freq 1 A a 18 2 B a 27 3 C a 30 4 D a 30 5 E a 27 6 F a 18 v-as.data.frame(xtabs(~x+xx)) head(v) x xx Freq 1 A a 18 2 B a 27 3 C a 30 4 D a 30 5 E a 27 6 F a 18 Regards Petr r-help-boun...@r-project.org napsal dne 05.01.2011 08:46:21: Hi Dennis, It worked! this is what I am looking for. Many thanks. Rgds, SNVK _ From: Dennis Murphy [mailto:djmu...@gmail.com] Sent: Tuesday, January 04, 2011 9:07 PM To: SNV Krishna Cc: r-help@r-project.org Subject: Re: [R] how to subset unique factor combinations from a data frame. Hi: Did you try something like summdf - as.data.frame(with(df, table(Commodity, Attribute, Unit))) ? The rows of the table should represent the unique combinations of the three variables Here's a simple toy example to illustrate: x - sample(LETTERS[1:6], 1000, replace = TRUE) xx - sample(letters[1:6], 1000, replace = TRUE) u - as.data.frame(table(x, xx)) dim(u) [1] 36 3 head(u) x xx Freq 1 A a 26 2 B a 29 3 C a 25 4 D a 25 5 E a 27 6 F a 29 HTH, Dennis On Tue, Jan 4, 2011 at 2:19 AM, SNV Krishna kris...@primps.com.sg wrote: Hi, Sorry that my example is not clear. I will give an example of what each variable holds. I hope this clearly explains the case. Names of the dataframe (df) and description Year :- Year is calendar year, from 1980 to 2010 Country :- is the country name, total no. (levels) of countries is ~ 190 Commodity :- Crude oil, Sugar, Rubber, Coffee No. (levels) of commodities is 20 Attribute: - Production, Consumption, Stock, Import, Export... Levels ~ 20 Unit :- this is actually not a factor. It describes the unit of Attribute. Say the unit for Coffee (commodity) - Production (attribute) is 60 kgs. While the unit for Crude oil - Production is 1000 barrels Value :- value tail(df, n = 10) // example data// YearCountry Commodity Attribute Unit Value 1991United Kingdom Wheat, DurumTotal Supply(1000 MT) 70 1991United Kingdom Wheat, DurumTY Exports (1000 MT) 0 1991United Kingdom Wheat, DurumTY Imp. from U (1000 MT) 0 1991United Kingdom Wheat, DurumTY Imports (1000 MT) 60 1991United Kingdom Wheat, DurumYield (MT/HA) 5 Wish this is clear. Any suggestion Regards, SNVK -Original Message- From: Petr PIKAL [mailto:petr.pi...@precheza.cz] Sent: Tuesday, January 04, 2011 4:06 PM To: SNV Krishna Cc: r-help@r-project.org Subject: Odp: [R] how to subset unique factor combinations from a data frame. Hi r-help-boun...@r-project.org napsal dne 04.01.2011 05:21:25: Hi All I have these questions and request members expert view on this. a) I have a dataframe (df) with five factors (identity variables) and value (measured value). The id variables are Year, Country, Commodity, Attribute, Unit. Value is a value for each combination of this. I would like to get just the unique combination of Commodity, Attribute and Unit. I just need the unique factor combination into a dataframe or a table. I know aggregate and subset but dont how to use them in this context. aggregate(Value, list(Comoditiy, Atribute, Unit), function) b) Is it possible to inclue non- aggregate columns with aggregate function say in the above case aggregate(Value ~ Commodity + Attribute, data = df, FUN = count). The use of count(Value) is just a round about to return the combinations of Commodity Attribute, and I would like to include 'Unit' column in the returned data frame? Hm. Maybe xtabs? But without any example it is only a guess. c) Is it possible to subset based on unique combination, some thing like this. subset(df, unique(Commodity), select = c(Commodity, Attribute, Unit)). I know this is not correct as it returns an error 'subset needs a logical evaluation'. Trying various ways to accomplish the task. Probably sqldf package has tools for doing it but I do not use it so you have to try yourself. df[Comodity==something, c(Commodity, Attribute, Unit)] can be other way. Anyway your explanation is ambiguous. Let say you have three rows with the same Commodity. Which row do you want to select? Regards Petr will be grateful for any ideas and help Regards, SNVK [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] Cost-benefit/value for money analysis
Hi: Are you perhaps thinking of conjoint analysis? Dennis On Wed, Jan 5, 2011 at 1:30 AM, Graham Smith myotis...@gmail.com wrote: Ben Perhaps you can specify your question more precisely, or differently. The way I interpret it, if there are no interactions in price (e.g. you get a discount for buying more than one book at a time) or in value (e.g. you learn more from one book having read another), then you get the best value/price ratio by taking only the book with the highest value/price. (If you take no books at all, your value/price ratio is undefined.) The algebra below shows that combining a lower value/price book with a higher one always lowers your overall value/price ratio. Thanks, for the pointers on R functions. My question was as superficial as it sounded. I have a commercial programme that does this (one of several that are available), and wondered if there was an R package that provided the same tools. It's a common tool, and I had hoped to have explained enough to allow an appropriate package to be identified, so I could have a quick look at what it does. But having started this, I now feel obliged to clarify the question. I only chose books as an easy example, you could substitute alternative marketing strategies, monitoring programmes, choice of ornaments for a new house, or holidays etc. So there could be only a few potential combinations or hundreds. But to stick with the books, and only three options: A, B and C Book A costs $100 and I have given it a subjective value of 50 Book B costs $36 and I have given it a subjective value of 60 Book C costs $50 and I have given it a subjective value of 80 So book A is costing me $2 per value unit, Book B $0.6 per value unit and book C £0.63 per value unit. Buying books A+B gives me a $1.24 per value unit Buying books A+C gives $1.07 per value unit Buying books B+C gives 0.61 per value unit Buying books A+B+C gives 0.97 per value unit So in terms of value for money, there are three contenders Book B on its own, Book C on its own, or buying both books B and C. Book B $36.00 and value 60 Book C $50.00 and value 80 Book B+C at $76.00 and value 140 Depending on how you are using this tool, you can either use it to decide how spend an existing budget, or use it to set a budget. Seems hardly worth the bother for three books but if you are looking at 20 books or 30 different monitoring options etc, it gives a useful insight into how best to spend or set a budget The commercial software graphs this costs vs values so you usually end up with some sort of an asymptotic graph where you can see that spending below a certain budget gives a very poor return. Graham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cost-benefit/value for money analysis
David, I think a similar argument at the margins would show that even if the task were specified as maximal value with a budget, simply ordering by the value/price and buying until the cumsum of the price was greater than budget would solve the alternate statement of the problem. I suppose there might be situations where there were marginal choices of buying two books whose value/price was less than marginally maximal because two other marginally maximal choices would break the budget. This sounds like a homework problem and I don't see any student effort yet. Search terms include: decision analysis , cost-benefit analysis, or utility theory. Hopefully, my response to Ben will clarify my question, and why I am asking it. At the moment (and that may change) I'm not specifically interested in how you do it R, just as to whether there is a package aimed at this kind of Cost Benefit analysis. Graham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting Fortran or C++ etc to R
On Wed, Jan 5, 2011 at 7:33 AM, lcn lcn...@gmail.com wrote: As for your actual requirement to do the convertion, I guess there'd not exist any quick ways. You have to be both familiar with R and the other language to make the rewrite work. To make the rewrite work _well_ is the bigger problem! The easiest way to big performance wins is going to be spotting vectorisation possibilities in the Fortran code. Any time you see a DO K=1,N loop then look to see if its just a single vector operation in R. Another way to big wins is to write test code, so you can check if your R code gives the same results as the Fortran (C/C++) code at every stage of the rewrite. Don't just write it all in one go and then hope it works! Small steps Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cost-benefit/value for money analysis
Denis, Are you perhaps thinking of conjoint analysis? Thanks, but as far as I can make out, having just looked at conjoint analysis, it looks like some form of discriminant analysis, which is not what I am looking for. I only have two variables cost and value. I am ignoring how you establish the value, I just need to be able to assess every possible combination of costs and value. Its common technique in the Decision Analysis literature (and specialist Decision Analysis software), but I have never seen it given a Specific name. But of course it may have several names, and be used across different disciplines for different purposes. Its such a common tool, that I was hoping that someone would instantly recognise, what I was describing, and be able to say that it was available in a particular package. But I had never looked at conjoint analysis before, so nice to know it exists. Graham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to save graphs out of ACF ?
Hi, I want to save the autocorrelation plots resulting out of ACF (acf(ts)), not just by using the Save as command in the R Gui but using some sort of code, which allows me to chose the format and the path. Thank you, Mihai [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.views(Spatial) does not seem to be able to find RPyGeo package
On Tue, 4 Jan 2011, Linder, Eric wrote: I have this problem with loading RPyGeo package when using update.views. How can I fix this. Only by changing the operating system. You are using Linux but the RPyGeo package require Windows, see http://CRAN.R-project.org/package=RPyGeo update.views() (or actually the underlying call to install.packages()) just informs you about this through a warning. hth, Z I have tried to use other CRAN mirrors with the same result. Below is a copy of my session. -session--- R version 2.12.1 (2010-12-16) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i486-pc-linux-gnu (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] library(ctv) update.views('Spatial') --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done Warning message: In update.views(Spatial) : The following packages are not available: RPyGeo -session--- The information contained in this communication may be C...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: List to a summary table
Hi r-help-boun...@r-project.org napsal dne 05.01.2011 01:26:42: Hi Suppose you have the code below. The result I get from the cat function is from the avgs object. Now, I have 30 diferent objects like this and I wish I would stick to list and do not make 30 objects a.g. avg.width from avgs object can be extracted by as.numeric(unlist(sapply(avgs, function(x) x[4])))[-1] Regards Petr to make a summary table, something like: Avgs1 Avgs2 Avgs3 i= 2 average= 0.515983i i= 2 average= 0.746983 i= 2 average= 0.2665983 i= 3 average= 0.5135953 i= 3 average= 0.7345953 i= 3 average= 0.23455953 i= 4 average= 0.4998128 i= 4 average= 0.7233128 i= 4 average= 0.21398128 library(cluster) d-hclust(dist(iris[,-5])) avgs-sapply(1:20,function(x) + summary(silhouette(cutree(d,x), + dist(iris[,-5] # str(avgs) # print out the average widths for (i in 2:length(avgs)){ # ignore first item + cat('i=', i, 'average=', avgs[[i]]$avg.width, '\n') + } i= 2 average= 0.515983 i= 3 average= 0.5135953 i= 4 average= 0.4998128 i= 5 average= 0.346174 i= 6 average= 0.3382031 i= 7 average= 0.3297649 i= 8 average= 0.324025 i= 9 average= 0.3191681 i= 10 average= 0.3028503 i= 11 average= 0.3072648 i= 12 average= 0.2834498 i= 13 average= 0.2776717 i= 14 average= 0.2855396 i= 15 average= 0.2745142 i= 16 average= 0.2578903 i= 17 average= 0.2531909 i= 18 average= 0.2473504 i= 19 average= 0.2484205 i= 20 average= 0.2545357 thanks A.Dias -- View this message in context: http://r.789695.n4.nabble.com/List-to-a-summary- table-tp3174698p3174698.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use S-Plus functions in R
On 11-01-05 3:17 AM, Hein wrote: Hi I am very new to R. I used to work in S-Plus a lot but that was years ago. I wrote a large number of functions that I now want to view and edit in R. I know I have to tell R where the functions are but I have no idea how. The functions are stored on my laptop's c-drive. I tried everything I could find e.g. library(myfilepath), source(myfilepath) etc. but nothing seems to work. Hein Save their source as text, and source that. R can't read the binary S-Plus objects for recent S-Plus versions. Since R and S-Plus are not identical, you may need some modifications to the functions to get them to work in R: so test carefully. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Print plot to pdf, jpg or any other format when using scatter3d error
On 11-01-04 5:36 PM, Jurica Seva wrote: Thank you, Duncan, it works now with rgl.snapshot (i did have to upgrade to 2.12.1). Is there any way to manipulate the size of the created image? The created plots are a bit small (256*256) Sure, they're the size of the window: it's just a snapshot. Just make it bigger (by mouse, or using par3d(windowRect= ...)) before taking the snapshot. Duncan Murdoch Thank you for your help once again :) Best, Jurica On Tue, Jan 4, 2011 at 8:31 AM, Duncan Murdoch murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote: On 03/01/2011 8:17 PM, Jurica Seva wrote: Hi, I have been trying to output my graphs to a file (jpeg, pdf, ps, it doesnt matter) but i cant seem to be able to get it to output. As Uwe said, you are using rgl graphics, not base graphics. So none of the standard devices work, you need to use the tools built into rgl. Attach that package, and then read ?rgl.postscript (for graphics in various vector formats, not just Postscript) and ?rgl.snapshot (for bitmapped graphics). Some notes: - For a while rgl.snapshot wasn't working in the Windows builds with R 2.12.1; that is now fixed, so you should update rgl before getting frustrated. - rgl.snapshot just takes a copy of the graphics buffer that is showing on screen, so it is limited to the size you can display - rgl.postscript does a better job for the parts of an image that it can handle, but it is not a perfect OpenGL emulator, so it doesn't always include all components of a graph properly. Duncan Murdoch I tried a few things but none of them worked and am lost as what to do now. I am using the scatter3d function, and it prints out the graphs on tot he screen without any problems, but when it comes to writing them to a file i cant make it work. Is there any other way of producing 3dimensional graphs (they dont have to be rotatable/interactive after the print out)? The code is fairly simple and is listed down : #libraries library(RMySQL) library(rgl) library(scatterplot3d) library(Rcmdr) ## #database connection mycon- dbConnect(MySQL(), user='root',dbname='test',host='localhost',password='') #distinct sessions rsSessionsU01- dbSendQuery(mycon, select distinct sessionID from actiontimes where userID = 'ID01') sessionU01-fetch(rsSessionsU01) sessionU01[2,] #user01 data mycon- dbConnect(MySQL(), user='root',dbname='test',host='localhost',password='') rsUser01- dbSendQuery(mycon, select a.userID,a.sessionID,a.actionTaken,a.timelineMSEC,a.durationMSEC,b.X,b.Y,b.Rel__dist_,b.Total_dist_ from `actiontimes` as a , `ulogdata` as b where a.originalRECNO = b.RECNO and a.userID='ID01') user01- fetch(rsUser01, n= -1) user01[1,1] #plot loop for (i in 1:10){ userSubset-subset(user01,sessionID == sessionU01[i,],select=c(timelineMSEC,X,Y)) userSubset x-as.numeric(userSubset$X) y-as.numeric(userSubset$Y) scatter3d(x,y,userSubset$timeline,xlim = c(0,1280), ylim = c(0,1024), zlim=c(0,180),type=h,main=sessionU01[i,],sub=sessionU01[i,]) tmp6=rep(.ps) tmp7=paste(sessionU01[i,],tmp6,sep=) tmp7 rgl.postscript(tmp7,ps,drawText=FALSE) #pdf(file=tmp7) #dev.print(file=tmp7, device=pdf, width=600) #dev.off(2) } __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding lines in ggplot2
Dear Eduardo, This a solution that you seem to want n - 1:10 x - sqrt(n) y - log(n) qplot(n, x, geom=line, colour=darkgreen) + geom_line(data = data.frame(n , x = y), colour=red) But please compare it with the solution (code + result) below. Formatting the data.frame might be a bit more work, but formatting your graph is much easier. n - 1:10 dataset - rbind( data.frame(Number = n, Function = sqrt, Result = sqrt(n)), data.frame(Number = n, Function = log, Result = log(n)) ) #Using the default colours ggplot(dataset, aes(x = Number, y = Result, colour = Function)) + geom_line() #Using user-specified colours ggplot(dataset, aes(x = Number, y = Result, colour = Function)) + geom_line() + scale_colour_manual(values = c(sqrt = darkgreen, log = red)) Think about the gain when you want to display much more than 2 lines... dataset - expand.grid(Number = n, Power = seq(0, 2, length = 21)) dataset$Result - dataset$Number ^ dataset$Power ggplot(dataset, aes(x = Number, y = Result, colour = factor(Power))) + geom_line() HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Eduardo de Oliveira Horta Verzonden: woensdag 5 januari 2011 3:56 Aan: r-help Onderwerp: [R] Adding lines in ggplot2 Hello, this is probably a recurrent question, but I couldn't find any answers that didn't involve the expression data frame... so perhaps I'm looking for something new here. I wanted to find a code equivalent to x=sqrt(1:10) y=log(1:10) plot(1:10, x, type=lines, col=darkgreen) lines(1:10, y, col=red) to use with ggplot2. I've tried x=sqrt(1:10) y=log(1:10) qplot(1:10, x, geom=line, colour=I(darkgreen)) geom_line(1:10, y, colour=red) Error: ggplot2 doesn't know how to deal with data of class numeric but it seems that the data frame restriction is really very restrictive here. Any solutions that don't imply using as.data.frame to my data? Thanks in advance, and best regards! Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding lines in ggplot2
It was gently suggested to me in a private message that to achieve *complete* control over the inputs and outputs in R graphics one should be using grid graphics. I concur with that suggestion and wish to amend my previous statement accordingly. With kindest thanks, Dennis On Wed, Jan 5, 2011 at 3:02 AM, Dennis Murphy djmu...@gmail.com wrote: Hi Bert: On Tue, Jan 4, 2011 at 8:39 PM, Bert Gunter gunter.ber...@gene.comwrote: Dennis: Can't speak to ggplot2, but your comments regarding lattice are not quite correct. Many if not all of lattice's basic plot functions are generic, which means that one has essentially complete latitude to define plotting methods for arbitrary data structures. For example, there is an xyplot.ts method for time series -- class ts -- data. Of course, for most lattice methods, the data do naturally come in a data frame, and a standard lattice argument is to give a frame from which to pull the data. But this is not required. I'm aware of that, but thank you for clarifying matters. I didn't state explicitly whether lattice required data frame input or not (my lattice example indicated no and indeed it does not), but the message was evidently muddled further down the post. Your comments speak to some of the differences in the design and philosophy of lattice and ggplot2, and I have no disagreement with your remarks about lattice. The point I was trying to make was that by using data frames and the several packages/base functions that support their manipulation, one can simplify the coding of graphics within both ggplot2 and lattice. There are many things one can do with data frames that one cannot with vectors, as you well know - e.g., extensions with new data (rbind) or new variables (cbind/transform, etc.), or reshaping, among others. These features can be used to advantage in both ggplot2 and lattice. The OP's example is a simple one - had he used df - data.frame(x = sqrt(1:10), y = log(1:10)) # oops, forgot 1:10... qplot(as.numeric(rownames(df)), x, data = df, geom = 'line', colour = I('darkgreen')) # ...but it's OK # or xyplot(x ~ as.numeric(rownames(x)), data = df, type = 'l', col.line = 'darkgreen') there would have been no problem. A little inconvenient for a new user, maybe, but hardly 'very restrictive'. As for other types of R data objects that are not data frames, offhand I can't think of too many that are incapable of being converted to data frames somehow for the purposes of graphics, although I wouldn't be remotely surprised if some existed. [For example, one can extract fitted values, residuals and perhaps a model matrix from a model object and place the results in a data frame.] ggplot2 has a fortify() method to allow one to transform data objects for use in the package. There is some discussion in Chapter 9 of Hadley's book, but I'm not in a position to add insight as I haven't used it personally. I do think this is a fair statement, though, and it's been said before: if one wants *complete* control and flexibility of inputs and outputs, use base graphics. Both lattice and ggplot2, by virtue of being structured graphics systems, impose certain constraints (e.g., default actions) on the user which are system-dependent. Prof. Vardeman's quote still applies :) Dennis -- Bert Please explain to me how df - data.frame(x, y, index = 1:10) qplot(index, x, geom = 'line', ...) is 'very restrictive'. Lattice and ggplot2 are *structured* graphics systems - to get the gains that they provide, there are some costs. I don't perceive organization of data into a data frame as being restrictive - in fact, if you learn how to construct data for input into ggplot2 to simplify the code for labeling variables and legends, the data frame requirement is actually a benefit rather than a restriction. Moreover, one can use the plyr and reshape(2) packages to reshape or condense data frames to provide even more flexibility and freedom to produce ggplot2 and lattice graphics. In addition, the documentation for ggplot2 is quite explicit about requiring data frames for input, so it is behaving as documented. The complexity (and interaction) of the graphics code probably has something to do with that. Since Josh left you a quote, I'll supply another, from Prof. Steve Vardeman in a class I took with him a long time ago: There is no free lunch in statistics: in order to get something, you've got to give something up. In this case, if you want the nice infrastructure provided by ggplot2, you have to create a data frame for input. Dennis Thanks in advance, and best regards! Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] dotchart for matrix data
Readers, The following commands were applied, to create a dot chart with black dots and blue squares for data: library(lattice) testdot category values 1b 44 2c 51 3d 65 4a 10 5b 64 6c 71 7d 49 8a 27 dotplot(category~values,col=c(black,black,black,black,blue,blue,blue,blue),bg=c(black,black,black,black,blue,blue,blue,blue),pch=c(21,21,21,21,22,22,22,22),xlab=NULL, data=testdot) The resultant graph shows correctly coloured points, but not filled, only the border is coloured. The documentation for the command 'pch' (?pch) indicates that the commands shown above should show appropriately coloured solid symbols. What is causing this error please? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique limited to 536870912
Could it be that you are running on a 32-bit version of R? 536870912 * 4 = 2GB if those were integers which would use up all of memory. You never did show what your error message was or what system you were using. On Wed, Jan 5, 2011 at 12:08 AM, Indrajeet Singh sin...@cs.ucr.edu wrote: Hi I am using R with igraph to analyze an edgelist that is greater than the said amount. Does anyone know a way around this? Thanks Inder __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to save graphs out of ACF ?
Mihai.Mirauta at bafin.de writes: Hi, I want to save the autocorrelation plots resulting out of ACF (acf(ts)), not just by using the Save as command in the R Gui but using some sort of code, which allows me to chose the format and the path. Thank you, Mihai for example: a - acf(runif(10)) pdf(acf.pdf) plot(a) dev.off() __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What are the necessary Oracle software to install and run ROracle ?
On Jan 5, 2011, at 2:55 AM, thomas.car...@bnpparibas.com wrote: Hello, I am running Linux, I have downloaded instantclient-basiclite-linux32-11.2.0.2.0.zip instantclient-sqlplus-linux32-11.2.0.2.0.zip instantclient-sdk-linux32-11.2.0.2.0.zip instantclient-precomp-linux32-11.2.0.2.0.zip All these tarballs are unzipped in /usr/local/lib/instantclient, I have added this path in the library path of the host. I can run sqlplus and proc, they do not complain about missing symbol. Then I install ROracle : install.packages(ROracle) Compilation step is OK But when the test step tries to load the ROracle.so library, it fails : ** testing if installed package can be loaded Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library '/opt/R-2.11.1/lib/R/library/ROracle/libs/ROracle.so': /opt/R-2.11.1/lib/R/library/ROracle/libs/ROracle.so: undefined symbol: sqlprc Here is my list of lib in instantclient directory : $ find -name *.*o -o -name *.a ./libsqlplusic.so ./sdk/demo/procobdemo.pco ./cobsqlintf.o ./libociicus.so ./libnnz11.so ./libocijdbc11.so ./libsqlplus.so Do I need so more lib ? From which Oracle tarball ? Thanks for help If you have not, read through the INSTALL file for the package: http://cran.r-project.org/web/packages/ROracle/INSTALL Past postings with similar issues regarding the inability to load shared libs would suggest that compiling and installing the package outside of R from the CLI using 'R CMD INSTALL ...' rather than from within R using install.packages(ROracle), may resolve the issue. Also, be sure you are running all of this as root, since installation to default locations will require root privileges. Two more things to consider: 1. R 2.12.1 is the current version of R. If you can, I would recommend updating from 2.11.1. 2. Be sure that you don't have a conflict between 32 and 64 bit versions of R and the Oracle tool chain. All components need to be one or the other. You seem to be using 32 bit versions of the Oracle components above. Check: .Machine$sizeof.pointer in R to see if you are running 32 or 64 bit R. If the former, the above will return 4, if the latter, 8. Another alternative would be to consider using Prof. Ripley's RODBC package and connecting to Oracle via ODBC. If you need further assistance, I would suggest subscribing and posting to r-sig-db or contacting the package author directly. More info on the list is here: https://stat.ethz.ch/mailman/listinfo/r-sig-db HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R not recognized in command line
Hi Aaditya, I assume you are running some variant of Windows and by the prompt in DOS you are using cmd.exe. Perhaps you are already, but from your examples it looks like either A) you are not in the same directory as R or B) are not adding the path to R in the command. For example, on Windows I always install R under C:\R\ so for me inside cmd.exe: C:\directory C:\R\R-devel\bin\x64\R [[[R starts here]]] alternately you could switch directories over and then just type R at the console: C:\directory cd C:\R\R-devel\bin\x64\ C:\R\R-devel\bin\x64 R [[[R starts here]]] or since you have set the environment variables: C:\directory %R_HOME%\bin\x64\R [[[R starts here]]] Alternately, edit the PATH environment variable in Windows and add the path to R (i.e., R_HOME\bin\i386\ or whatever it is for you), and you should be able to just enter R at the command prompt and have it start. Cheers, Josh On Tue, Jan 4, 2011 at 9:39 PM, Aaditya Nanduri aaditya.nand...@gmail.com wrote: Hello all, I recently installed rpy2 so that I could use R through Python. However, R was not recognized in the command line. So I decided to add it to the PATH variables. But it just doesnt work And what I mean by it doesnt work is : No matter what I type at the prompt in DOS- be it R, Rcmd, R CMD, Rscript- it is not recognized as a command. Path variables used : 1. %R_HOME% -- C:\Program Files\R\R 2.12.1\ 2. %R_HOME%\bin 3. %R_HOME%\bin\i386 4. Some Batchscripts I found online that recognize the R.exe in \bin\i386 but only if I run the batch file...its not natively recognized (if I were to type 'R' at the prompt in DOS, its not recognized) I would appreciate any help in this matter. Or should I do something else so that I can try rpy2? Python version 2.6.6 R 2.12.1 rpy2 2.0.8 -- Aaditya Nanduri aaditya.nand...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packagename:::functionname vs. importFrom
Thanks very much Luke for clarifying. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/packagename-functionname-vs-importFrom-tp3172684p3175567.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stop and call objects
Dear R-users, Let's consider the following snippet: f - function(x) tryCatch(sum(x),error=function(e) stop(e)) f('a') As expected, the last call returns an error message: Error in sum(x) : invalid 'type' (character) of argument My questions are the following: 1- can I easily ask the stop function to reference the f function in addition to sum(x) in the error message? 2- If not, I guess I would have to extract the call and message objects from e, coerce the call as a character object, build a custom string, and pass it to the stop function using call.=F. How can I coerce a call object to a character and maintain the aspect of the printed call (i.e. sum(x) instead of the character vector sum x returned by as.character(e$call))? Thank you Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding lines in ggplot2
I thank you all for the insightful answers. I'm quite a rookie in R and have built a code that didn't take data frames into account. But I suppose I'm now convinced that they're actually a practical structure for organizing the data... so I'll adhere to the Data Frame Club soon enough. Best regards, Eduardo Horta On Wed, Jan 5, 2011 at 11:01 AM, Dennis Murphy djmu...@gmail.com wrote: It was gently suggested to me in a private message that to achieve *complete* control over the inputs and outputs in R graphics one should be using grid graphics. I concur with that suggestion and wish to amend my previous statement accordingly. With kindest thanks, Dennis On Wed, Jan 5, 2011 at 3:02 AM, Dennis Murphy djmu...@gmail.com wrote: Hi Bert: On Tue, Jan 4, 2011 at 8:39 PM, Bert Gunter gunter.ber...@gene.com wrote: Dennis: Can't speak to ggplot2, but your comments regarding lattice are not quite correct. Many if not all of lattice's basic plot functions are generic, which means that one has essentially complete latitude to define plotting methods for arbitrary data structures. For example, there is an xyplot.ts method for time series -- class ts -- data. Of course, for most lattice methods, the data do naturally come in a data frame, and a standard lattice argument is to give a frame from which to pull the data. But this is not required. I'm aware of that, but thank you for clarifying matters. I didn't state explicitly whether lattice required data frame input or not (my lattice example indicated no and indeed it does not), but the message was evidently muddled further down the post. Your comments speak to some of the differences in the design and philosophy of lattice and ggplot2, and I have no disagreement with your remarks about lattice. The point I was trying to make was that by using data frames and the several packages/base functions that support their manipulation, one can simplify the coding of graphics within both ggplot2 and lattice. There are many things one can do with data frames that one cannot with vectors, as you well know - e.g., extensions with new data (rbind) or new variables (cbind/transform, etc.), or reshaping, among others. These features can be used to advantage in both ggplot2 and lattice. The OP's example is a simple one - had he used df - data.frame(x = sqrt(1:10), y = log(1:10)) # oops, forgot 1:10... qplot(as.numeric(rownames(df)), x, data = df, geom = 'line', colour = I('darkgreen')) # ...but it's OK # or xyplot(x ~ as.numeric(rownames(x)), data = df, type = 'l', col.line = 'darkgreen') there would have been no problem. A little inconvenient for a new user, maybe, but hardly 'very restrictive'. As for other types of R data objects that are not data frames, offhand I can't think of too many that are incapable of being converted to data frames somehow for the purposes of graphics, although I wouldn't be remotely surprised if some existed. [For example, one can extract fitted values, residuals and perhaps a model matrix from a model object and place the results in a data frame.] ggplot2 has a fortify() method to allow one to transform data objects for use in the package. There is some discussion in Chapter 9 of Hadley's book, but I'm not in a position to add insight as I haven't used it personally. I do think this is a fair statement, though, and it's been said before: if one wants *complete* control and flexibility of inputs and outputs, use base graphics. Both lattice and ggplot2, by virtue of being structured graphics systems, impose certain constraints (e.g., default actions) on the user which are system-dependent. Prof. Vardeman's quote still applies :) Dennis -- Bert Please explain to me how df - data.frame(x, y, index = 1:10) qplot(index, x, geom = 'line', ...) is 'very restrictive'. Lattice and ggplot2 are *structured* graphics systems - to get the gains that they provide, there are some costs. I don't perceive organization of data into a data frame as being restrictive - in fact, if you learn how to construct data for input into ggplot2 to simplify the code for labeling variables and legends, the data frame requirement is actually a benefit rather than a restriction. Moreover, one can use the plyr and reshape(2) packages to reshape or condense data frames to provide even more flexibility and freedom to produce ggplot2 and lattice graphics. In addition, the documentation for ggplot2 is quite explicit about requiring data frames for input, so it is behaving as documented. The complexity (and interaction) of the graphics code probably has something to do with that. Since Josh left you a quote, I'll supply another, from Prof. Steve Vardeman in a class I took with him a long time ago: There is no free lunch in statistics: in order to get
Re: [R] R command execution from shell
Thank you for this alternative. Both seem to work on my systems. Sebastien Prof Brian Ripley wrote: On Tue, 4 Jan 2011, Duncan Murdoch wrote: On 04/01/2011 3:21 PM, Sebastien Bihorel wrote: Dear R-users, Is there a way I can ask R to execute the write(hello world,file=hello.txt) command directly from the UNIX shell, instead of having to save this command to a .R file and execute this file with R CMD BATCH? Yes. Some versions of R support the -e option on the command line to execute a particular command. It's not always easy to work out the escapes so your shell passes all the quotes through... An alternative is to echo the command into the shell, e.g. echo 'cat(hello)' | R --slave (where the outer ' ' are just for bash). It is marginally preferable to use Rscript in place of 'R --slave'. I think in all known shells Rscript -e write('hello world', file = 'hello.txt') will work. (If not, shQuote() will not work for that shell, but this does work in sh+clones, csh+clones, zsh and Windows' cmd.exe.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dotchart for matrix data
On Jan 5, 2011, at 8:11 AM, e-letter wrote: Readers, The following commands were applied, to create a dot chart with black dots and blue squares for data: library(lattice) testdot category values 1b 44 2c 51 3d 65 4a 10 5b 64 6c 71 7d 49 8a 27 dotplot (category ~ values ,col = c (black ,black ,black ,black ,blue ,blue ,blue ,blue ),bg = c (black ,black ,black ,black ,blue ,blue,blue,blue),pch=c(21,21,21,21,22,22,22,22),xlab=NULL, data=testdot) The resultant graph shows correctly coloured points, but not filled, only the border is coloured. The documentation for the command 'pch' (?pch) indicates that the commands shown above should show appropriately coloured solid symbols. What is causing this error please? There is no pch command. It is a graphical parameter. If you are looking at the points help page then you are not looking at documentation that necessarily applies to a lattice function like dotplot. After first looking at ?dotplot, then ?panel.dotplot, and then because it says the points are done with panel.xyplot, my guess is that you need to add a fill =TRUE or a fill= color-vector option. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest speed improvements
Note that that isn't exactly what I recommended. If you look at the example in the help page for combine(), you'll see that it is combining RF objects trained on the same data; i.e., instead of having one RF with 500 trees, you can combine five RFs trained on the same data with 100 trees each into one 500-tree RF. The way you are using combine() is basically using sample size to limit tree size, which you can do by playing with the nodesize argument in randomForest() as I suggested previously. Either way is fine as long as you don't see prediction performance degrading. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of apresley Sent: Tuesday, January 04, 2011 6:30 PM To: r-help@r-project.org Subject: Re: [R] randomForest speed improvements Andy, Thanks for the reply. I had no idea I could combine them back ... that actually will work pretty well. We can have several worker threads load up the RF's on different machines and/or cores, and then re-assemble them. RMPI might be an option down the road, but would be a bit of overhead for us now. Using the method of combine() ... I was able to drastically reduce the amount of time to build randomForest objects. IE, using about 25,000 rows (6 columns), it takes maybe 5 minutes on my laptop. Using 5 randomForest objects (each with 5k rows), and then combining them, takes 1 minute. -- Anthony -- View this message in context: http://r.789695.n4.nabble.com/randomForest-speed-improvements- tp3172523p3174621.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] get() within a command, specifically lmer
Hello all. Why doesn't this work? d=data.frame(y=rpois(10,1),x=rnorm(10),z=rnorm(10),grp=rep(c('a','b'),each=5)) library(lme4) model=lmer(y~x+z+(1|grp),family=poisson,data=d) update(model,~.-z)###works, removes z var='z' update(model,~.-get(var))##doesn't remove z update(model,~. -get(var,pos=d))###doesn't remove z I am trying to remove z from the model in the update, but I can't do it using get(), which is what I would like to do for a more complicated program. There's something about environments and get() that I don't understand. Any suggestions? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R not recognized in command line
On 11-01-05 8:51 AM, Joshua Wiley wrote: Hi Aaditya, I assume you are running some variant of Windows and by the prompt in DOS you are using cmd.exe. Perhaps you are already, but from your examples it looks like either A) you are not in the same directory as R or B) are not adding the path to R in the command. For example, on Windows I always install R under C:\R\ so for me inside cmd.exe: C:\directory C:\R\R-devel\bin\x64\R [[[R starts here]]] alternately you could switch directories over and then just type R at the console: C:\directory cd C:\R\R-devel\bin\x64\ C:\R\R-devel\bin\x64 R [[[R starts here]]] or since you have set the environment variables: C:\directory %R_HOME%\bin\x64\R [[[R starts here]]] Alternately, edit the PATH environment variable in Windows and add the path to R (i.e., R_HOME\bin\i386\ or whatever it is for you), and you should be able to just enter R at the command prompt and have it start. Editing the PATH is probably the best approach, but a lot of people get it wrong because of misunderstanding how it works: - If you change PATH in one process the changes won't propagate anywhere else, and will be lost as soon as you close that process. That could be a cmd window, or an R session, or just about any other process that lets you change environment variables. - If you want to make global changes to the PATH, you need to do it in the control panel System|Advanced|Environment variables entries. - Often it is good enough to use a more Unix-like approach, and only make the change at startup of the cmd processor. You use the /k option when starting cmd if you want to run something on startup. Duncan Murdoch Cheers, Josh On Tue, Jan 4, 2011 at 9:39 PM, Aaditya Nanduri aaditya.nand...@gmail.com wrote: Hello all, I recently installed rpy2 so that I could use R through Python. However, R was not recognized in the command line. So I decided to add it to the PATH variables. But it just doesnt work And what I mean by it doesnt work is : No matter what I type at the prompt in DOS- be it R, Rcmd, R CMD, Rscript- it is not recognized as a command. Path variables used : 1. %R_HOME% -- C:\Program Files\R\R 2.12.1\ 2. %R_HOME%\bin 3. %R_HOME%\bin\i386 4. Some Batchscripts I found online that recognize the R.exe in \bin\i386 but only if I run the batch file...its not natively recognized (if I were to type 'R' at the prompt in DOS, its not recognized) I would appreciate any help in this matter. Or should I do something else so that I can try rpy2? Python version 2.6.6 R 2.12.1 rpy2 2.0.8 -- Aaditya Nanduri aaditya.nand...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simulation - Natrual Selection
Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a distribution of concentration tolerance values, taken from the inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value can be added to some of them, that increaces their concentration tolerance slightly, but not by a great deal, except in a few individuals, where it may be increaced dramatically(some sort of exponential dstribution perhaps). Then when the distribution of this simulated population of microbes has reached the next concentration (possibly the mean or mode of the distribution) (I have a series of 1 in 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next concentration. I know it's probably quite a heavy thing, it was just a thought that came to me, if anybody has any experience in this area of R or knows of something that allows this to be done, please let me know. Thanks, Ben. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rShowMessage Fatal error: unable to open the base package
Hi All, As you may know I am trying connect R with java by RJava, now I run the examples, I got this error rShowMessage Fatal error: unable to open the base package I am using 64bits windows 7 and eclipse. Any suggestions? Many thanks Ying [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R not recognized in command line
On Wed, Jan 5, 2011 at 10:41 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 11-01-05 8:51 AM, Joshua Wiley wrote: Hi Aaditya, I assume you are running some variant of Windows and by the prompt in DOS you are using cmd.exe. Perhaps you are already, but from your examples it looks like either A) you are not in the same directory as R or B) are not adding the path to R in the command. For example, on Windows I always install R under C:\R\ so for me inside cmd.exe: C:\directory C:\R\R-devel\bin\x64\R [[[R starts here]]] alternately you could switch directories over and then just type R at the console: C:\directory cd C:\R\R-devel\bin\x64\ C:\R\R-devel\bin\x64 R [[[R starts here]]] or since you have set the environment variables: C:\directory %R_HOME%\bin\x64\R [[[R starts here]]] Alternately, edit the PATH environment variable in Windows and add the path to R (i.e., R_HOME\bin\i386\ or whatever it is for you), and you should be able to just enter R at the command prompt and have it start. Editing the PATH is probably the best approach, but a lot of people get it wrong because of misunderstanding how it works: - If you change PATH in one process the changes won't propagate anywhere else, and will be lost as soon as you close that process. That could be a cmd window, or an R session, or just about any other process that lets you change environment variables. - If you want to make global changes to the PATH, you need to do it in the control panel System|Advanced|Environment variables entries. - Often it is good enough to use a more Unix-like approach, and only make the change at startup of the cmd processor. You use the /k option when starting cmd if you want to run something on startup. You can also use Rcmd.bat, R.bat, Rgui.bat, etc. found at http://batchfiles.googlecode.com Just put any you wish to use anywhere on your path and it will work on all cmd instances and will also work when you install a new version of R since it looks up R's location in the registry. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R not recognized in command line
On Wed, Jan 5, 2011 at 7:41 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: Editing the PATH is probably the best approach, but a lot of people get it wrong because of misunderstanding how it works: - If you change PATH in one process the changes won't propagate anywhere else, and will be lost as soon as you close that process. That could be a cmd window, or an R session, or just about any other process that lets you change environment variables. - If you want to make global changes to the PATH, you need to do it in the control panel System|Advanced|Environment variables entries. Note it is also possible to make global changes using the powershell by setting the user to Machine. [Environment]::SetEnvironmentVariable(TestVariable, Test value., Machine) Josh - Often it is good enough to use a more Unix-like approach, and only make the change at startup of the cmd processor. You use the /k option when starting cmd if you want to run something on startup. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] OT: Reprinting of Bertin's Semiology of Graphics
Aficionados of graphics may be interested to know that the English translation (1984) of Jacques Bertin's Semiology of Graphics has been reprinted by ESRI. http://www.amazon.com/Semiology-Graphics-Diagrams-Networks-Maps/dp/0299090604 new edition: http://www.amazon.com/Semiology-Graphics-Diagrams-Networks-Maps/dp/1589482611/ref=tmm_hrd_title_0 The long out-of-print 1984 edition sells for $380, but the new printing is a bargain at ~$49. It is all the more remarkable in that most of the diagrams and graphs were drawn by hand, yet show a palette of graphical techniques richer than our graphical software provides even today. best, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele StreetWeb: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R(D) Com under R1070
Can you please quote what you are referring to? The subject seems to refer to an R version R-1.7.0 which is for almost a decade outdated. Uwe Ligges On 05.01.2011 08:31, Henri Leblond wrote: I get the same trouble Please finally did you succeed fixing this trouble ? Henri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation - Natrual Selection
Date: Wed, 5 Jan 2011 15:48:46 + From: benjamin.w...@bathspa.org To: r-help@r-project.org Subject: [R] Simulation - Natrual Selection Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. You can't really have a p-value without a specific hypothesis to test, if you have that then all your other questions are probably easy to answer. Generally you want to sample from things that are iid or maybe you want to test the identical i. Generally you want to have done a lit search ahead of time and had some idea of likely evolution dynamics of your system given your design and things like your forcing functions etc. Most statisticians would not take seriously a posteriori designs and indeed it can be hard to avoid rationalization and selection bias ( problems that always and only effect people who disagree with me LOL) as being anything other than exploratory or hypothesis generating- you are looking for predictive value. While it is not always worthwhile doing blind tests, it may be something worth considering ( do you know which group gets what thing?) But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a http://www.google.com/#sclient=psyhl=enq=%22R+package%22+natural+selection but as implied above, R has lots of analysis stuff and maybe you would find something more useful that is not linked to the keywords you suggest. You may find, for whatever reason, you could write a differential equation to express your results but that isn't often used with natural selection. distribution of concentration tolerance values, taken from th e inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value can be added to some of them, that increaces their concentration tolerance slightly, but not by a great deal, except in a few individuals, where it may be increaced dramatically(some sort of exponential dstribution perhaps). Then when the distribution of this simulated population of microbes has reached the next concentration (possibly the mean or mode of the distribution) (I have a series of 1 in 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next concentration. I know it's probably quite a heavy thing, it was just a thought that came to me, if anybody has any experience in this area of R or knows of something that allows this to be done, please let me know. Thanks, Ben. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cost-benefit/value for money analysis
On Wed, Jan 5, 2011 at 12:29 PM, Graham Smith myotis...@gmail.com wrote: maximal choices would break the budget. This sounds like a homework problem and I don't see any student effort yet. Search terms include: decision analysis , cost-benefit analysis, or utility theory. Hopefully, my response to Ben will clarify my question, and why I am asking it. At the moment (and that may change) I'm not specifically interested in how you do it R, just as to whether there is a package aimed at this kind of Cost Benefit analysis. Try this: require(sos) findFn('cost benefit') found 12 matches Regards Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation - Natrual Selection
On 05/01/2011 16:37, Mike Marchywka wrote: Date: Wed, 5 Jan 2011 15:48:46 + From: benjamin.w...@bathspa.org To: r-help@r-project.org Subject: [R] Simulation - Natrual Selection Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. You can't really have a p-value without a specific hypothesis to test, if you have that then all your other questions are probably easy to answer. Generally you want to sample from things that are iid or maybe you want to test the identical i. My Hypothesis is that Cleaner A (I don't really want to go into names or brands), will exhbit a rise in concentration tolerance values, or rather, the microbial culture I keep exposed to it, will, reflecting aqquisition of antimicrobial resistance. And this has largely happened. And that in cleaner B, this will not happen, or if it does, it will not be as dramatic and take longer. So I expecting in my model, the cleaner variable to have a p below 0.05, and quite hight explanatory power, and a satisfying coefficient. The notion behind the hypothesis being that one might have a more difficult complex chemical structure, requiring more mutations to develop some resistance. I can't really do anything with genes or chemical structure at my current institution and at my level because of no equippment for that sort of thing, and that they felt it would be too far for a 3rd year project. So I'm using the concentration required to kill them - or stop them from growing, as a indication. Generally you want to have done a lit search ahead of time and had some idea of likely evolution dynamics of your system given your design and things like your forcing functions etc. Most statisticians would not take seriously a posteriori designs and indeed it can be hard to avoid rationalization and selection bias ( problems that always and only effect people who disagree with me LOL) as being anything other than exploratory or hypothesis generating- you are looking for predictive value. While it is not always worthwhile doing blind tests, it may be something worth considering ( do you know which group gets what thing?) But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a http://www.google.com/#sclient=psyhl=enq=%22R+package%22+natural+selection but as implied above, R has lots of analysis stuff and maybe you would find something more useful that is not linked to the keywords you suggest. You may find, for whatever reason, you could write a differential equation to express your results but that isn't often used with natural selection. distribution of concentration tolerance values, taken from th e inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value
[R] integration Sweave and TexMakerX
Hi, Does anyone know how to integrate texmakerx and sweave on Windows? I mean, to run .rnw files directly from texmakerx and get a pdf or dvi file. Thank you in advance, -- Sebastián Daza sebastian.d...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stop and call objects
Try this: f - function(x) tryCatch(sum(x),error=function(e)sprintf(Error in %s: %s, deparse(sys.call(1)), e$message)) f('a') On Wed, Jan 5, 2011 at 12:23 PM, Sebastien Bihorel sebastien.biho...@cognigencorp.com wrote: Dear R-users, Let's consider the following snippet: f - function(x) tryCatch(sum(x),error=function(e) stop(e)) f('a') As expected, the last call returns an error message: Error in sum(x) : invalid 'type' (character) of argument My questions are the following: 1- can I easily ask the stop function to reference the f function in addition to sum(x) in the error message? 2- If not, I guess I would have to extract the call and message objects from e, coerce the call as a character object, build a custom string, and pass it to the stop function using call.=F. How can I coerce a call object to a character and maintain the aspect of the printed call (i.e. sum(x) instead of the character vector sum x returned by as.character(e$call))? Thank you Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] get() within a command, specifically lmer
Formula syntax is different from regular syntax, it is quoted and not evaluated in the same way as regular commands (otherwise operations like '+' and '-' would do very different things). For what you are trying to do, I would suggest creating the formula as a string using paste or sprintf, then use as.formula on that string. You can also use the substitute function, but that tends to be a bit more complicated. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Patrick McKann Sent: Wednesday, January 05, 2011 8:26 AM To: r-help@r-project.org Subject: [R] get() within a command, specifically lmer Hello all. Why doesn't this work? d=data.frame(y=rpois(10,1),x=rnorm(10),z=rnorm(10),grp=rep(c('a','b'),e ach=5)) library(lme4) model=lmer(y~x+z+(1|grp),family=poisson,data=d) update(model,~.-z)###works, removes z var='z' update(model,~.-get(var))##doesn't remove z update(model,~. -get(var,pos=d))###doesn't remove z I am trying to remove z from the model in the update, but I can't do it using get(), which is what I would like to do for a more complicated program. There's something about environments and get() that I don't understand. Any suggestions? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Re: Simulation - Natrual Selection
Original Message Subject:Re: [R] Simulation - Natrual Selection Date: Wed, 05 Jan 2011 17:24:05 + From: Ben Ward benjamin.w...@bathspa.org To: Bert Gunter gunter.ber...@gene.com CC: Mike Marchywka marchy...@hotmail.com On 05/01/2011 17:08, Bert Gunter wrote: Couple of brief comments inline below. -- Bert On Wed, Jan 5, 2011 at 8:56 AM, Ben Wardbenjamin.w...@bathspa.org wrote: On 05/01/2011 16:37, Mike Marchywka wrote: Date: Wed, 5 Jan 2011 15:48:46 + From: benjamin.w...@bathspa.org To: r-help@r-project.org Subject: [R] Simulation - Natrual Selection Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. You can't really have a p-value without a specific hypothesis to test, -- More precisely: A p-value loses its meaning unless the tested hypotheses are PRESPECIFIED -- i.e. determined BEFORE looking at the data. My hypothesis was specified before I did my experiment. Whilst far from perfect, I've tried to do the best I can to assess rise in resistance, without going into genetics as it's not possible. (Although may be at the next institution I've applied for MSc). With my hypothesis (I mentioned it below), I was of the frame of mind that a nonsignificant p-value on the cleaner variable (for now - experiment is far from over), indicated a lack of evidence for rejecting the null. And so at the minute, it looks like the type of cleaner makes no difference. if you have that then all your other questions are probably easy to answer. Generally you want to sample from things that are iid or maybe you want to test the identical i. -- This is false. iid is not required. Example: weighted least squares. It is true that figuring out the sampling distribution under non-iid sampling can be (much) more difficult. For example, pivots may not exist; approximations must typically be used. -- Bert My Hypothesis is that Cleaner A (I don't really want to go into names or brands), will exhbit a rise in concentration tolerance values, or rather, the microbial culture I keep exposed to it, will, reflecting aqquisition of antimicrobial resistance. And this has largely happened. And that in cleaner B, this will not happen, or if it does, it will not be as dramatic and take longer. So I expecting in my model, the cleaner variable to have a p below 0.05, and quite hight explanatory power, and a satisfying coefficient. The notion behind the hypothesis being that one might have a more difficult complex chemical structure, requiring more mutations to develop some resistance. I can't really do anything with genes or chemical structure at my current institution and at my level because of no equippment for that sort of thing, and that they felt it would be too far for a 3rd year project. So I'm using the concentration required to kill them - or stop them from growing, as a indication. Generally you want to have done a lit search ahead of time and had some idea of likely evolution dynamics of your system given your design and things like your forcing functions etc. Most statisticians would not take seriously
Re: [R] Navigating web pages using R
Hmm, Rcurl may be able to help you. Not sure I have not played with the query abilities. On Tue, Jan 4, 2011 at 10:54 AM, Erik Gregory egregory2...@yahoo.comwrote: R-Help, I'm trying to obtain some data from a webpage which masks the URL from the user, so an explicit URL will not work. For example, when one navigates to the web page the URL looks something like: http://137.113.141.205/rpt34s.php?flags=1 (changed for privacy, but i'm not sure you could access it anyways since it's internal to the agency I work for). The site has three drop-down menus for Site, Month, and Year. When a combination is selected of these, the resulting URL is always http://137.113.141.205/rpt34s (nothing changes, except flags=1 is dropped, so what I need to be able to do is write something that will navigate to the original URL, then select some combination of Site, Month, and Year, and then submit the query to the site to navigate to the page with the data. Is this a capability that R has as a language? Unfortunately, I'm unfamiliar with html or php programming, so if this question belongs in a forum on that I apologize. I'm trying to centralize all of my code for my analysis in R! Thank you, -Erik Gregory Student Assistant, California EPA CSU Sacramento, Mathematics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] randomForest speed improvements
From: Liaw, Andy Note that that isn't exactly what I recommended. If you look at the example in the help page for combine(), you'll see that it is combining RF objects trained on the same data; i.e., instead of having one RF with 500 trees, you can combine five RFs trained on the same data with 100 trees each into one 500-tree RF. The way you are using combine() is basically using sample size to limit tree size, which you can do by playing with the nodesize argument in randomForest() as I suggested previously. Either way is fine as long as you don't see prediction performance degrading. I should also mention that another way you can do something similar is by making use of the sampsize argument in randomForest(). For example, if you call randomForest() with sampsize=500, it will randomly draw 500 data points to grow each tree. This way you don't even need to run the RFs separately and combine them. Andy Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of apresley Sent: Tuesday, January 04, 2011 6:30 PM To: r-help@r-project.org Subject: Re: [R] randomForest speed improvements Andy, Thanks for the reply. I had no idea I could combine them back ... that actually will work pretty well. We can have several worker threads load up the RF's on different machines and/or cores, and then re-assemble them. RMPI might be an option down the road, but would be a bit of overhead for us now. Using the method of combine() ... I was able to drastically reduce the amount of time to build randomForest objects. IE, using about 25,000 rows (6 columns), it takes maybe 5 minutes on my laptop. Using 5 randomForest objects (each with 5k rows), and then combining them, takes 1 minute. -- Anthony -- View this message in context: http://r.789695.n4.nabble.com/randomForest-speed-improvements- tp3172523p3174621.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Forecasting with STL
Dear list, We have been using STL for seasonal decomposition, and would like to use the trend and seasonal component to forecast n steps ahead. There is no function called predict.stl, and inside an stl object there is no loess model to be predicted either. Our solution is to apply loess or lm using the trend of stl as auto-regressors after which we use predict.lm or predict.loess, and then apply seasonal modifiers to predicted data. Is there a more straight forward way to do this? Some function we are missing? Thank you in advance. Felipe Araujo Researcher in finance and economics Rio de Janeiro, Brasil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t-test or ANOVA...who wins? Help please!
Dear Tal Galili, thanks a lot for your answer! I agree with you, the t-test is comparing 2 conditions at one level of stimulus, while the ANOVA table is testing the significance of the interaction between condition and stimulsthe two tests are testing two different things. But still I don´t understand which is the right way to perform the analysis in order to solve my problem. Let´s consider now only the table I posted before. The same stimuli in the table have been presented to subjects in two conditions: A and AH, where AH is the condition A plus something elese (let´s call it H). I want to know if AT GLOBAL LEVEL adding H bring to better results in the participants evaluations of the stimuli rather than the stimulus presented only with condition A. Data in column response are evaluation on realism of the stimulus from a 7 point scale. If I calculate the mean for each stmulus in each condition, the results show that for each stimulus the AH condition is always greater than the first. Anyway, doing a t-test to compare the stimuli by couple (es. flat_550_W_realism in condition A, flat_550_W_realism in condition AH) I get that only sometimes the differences are statistically significant. I ask you if there is a way to say that condition AH is better than condition A, at global level. In attachment you find the table in .txt and also in .csv format. Is it possible for you to make an example in R, including also the R results in order to tell me what to see in the console to see if my problem is solved or not? For example, I was checking in the anova results the stimulus:conditon line.but I don´t know if my anova analysis was correct or not. I am not an expert of R, nor of statistics ;-( Anyway I am doing my best to study and understand. Please enlighten me. Thanks in advance Best regards From: Tal Galili tal.gal...@gmail.com To: Frodo Jedi frodo.j...@yahoo.com Cc: r-help@r-project.org Sent: Wed, January 5, 2011 10:15:41 AM Subject: Re: [R] t-test or ANOVA...who wins? Help please! Hello Frodo, It is not clear to me from your questions some of the basics of your analysis. If you only have two levels of a factor, and one response - why in the anova do you use more factors (and their interactions)? In that sense, it is obvious that your results would differ from the t-test. In either case, I am not sure if any of these methods are valid since your data doesn't seem to be normal. Here is an example code of how to get the same results from aov and t.test. And also a nonparametric option (that might be more fitting) flat_550_W_realism =c(3,3,5,3,3,3,3,5,3,3,5,7,5,2,3) flat_550_W_realism_AH =c(7,4,5,3,6,5,3,5,5,7,2,7,5, 5) x - c(rep(1, length(flat_550_W_realism)), rep(2, length(flat_550_W_realism_AH))) y - c(flat_550_W_realism , flat_550_W_realism_AH) # equal results between t test and anova t.test(y ~ x, var.equal= T) summary(aov(y ~ x)) # plotting the data: boxplot(y ~ x) # group 1 is not at all symetrical... wilcox.test(y ~ x) # a more fitting test Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Wed, Jan 5, 2011 at 12:37 AM, Frodo Jedi frodo.j...@yahoo.com wrote: I kindly ask you an help because I really don´t know how to solve this problem. number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH7 26flat_550_W_realism
[R] Assumptions for ANOVA: the right way to check the normality
Dear all, I would like to know which is the right way to check the normality assumption for performing ANOVA. How do you check normality for the following example? I did an experiment where people had to evaluate on a 7 point scale, the degree of realism of some stimuli presented in 2 conditions. The problem is that if I check normality with the Shapiro test I get that the data are not normally distributed. Someone suggested me that I don´t have to check the normality of the data, but the normality of the residuals I get after the fitting of the linear model. I really ask you to help me to understand this point as I don´t find enough material online where to solve it. If the data are not normally distributed I have to use the kruskal wallys test and not the ANOVA...so please help me to understand. I make a numerical example, could you please tell me if the data in this table are normally distributed or not? Help! number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH7 26flat_550_W_realismAH2 27flat_550_W_realismAH7 28flat_550_W_realismAH5 29flat_550_W_realismAH5 30 bump_2_step_W_realism A1 31 bump_2_step_W_realism A3 32 bump_2_step_W_realism A5 33 bump_2_step_W_realism A1 34 bump_2_step_W_realism A3 35 bump_2_step_W_realism A2 36 bump_2_step_W_realism A5 37 bump_2_step_W_realism A4 38 bump_2_step_W_realism A4 39 bump_2_step_W_realism A4 40 bump_2_step_W_realism A4 41 bump_2_step_W_realismAH3 42 bump_2_step_W_realismAH5 43 bump_2_step_W_realismAH1 44 bump_2_step_W_realismAH5 45 bump_2_step_W_realismAH4 46 bump_2_step_W_realismAH4 47 bump_2_step_W_realismAH5 48 bump_2_step_W_realismAH4 49 bump_2_step_W_realismAH3 50 bump_2_step_W_realismAH4 51 bump_2_step_W_realismAH5 52 bump_2_step_W_realismAH4 53 hole_2_step_W_realism A3 54 hole_2_step_W_realism A3 55 hole_2_step_W_realism A4 56 hole_2_step_W_realism A1 57 hole_2_step_W_realism A4 58 hole_2_step_W_realism A3 59 hole_2_step_W_realism A5 60 hole_2_step_W_realism A4 61 hole_2_step_W_realism A3 62 hole_2_step_W_realism A4 63 hole_2_step_W_realism A7 64 hole_2_step_W_realism A5 65 hole_2_step_W_realism A1 66 hole_2_step_W_realism A4 67 hole_2_step_W_realismAH7 68 hole_2_step_W_realismAH5 69 hole_2_step_W_realismAH5 70 hole_2_step_W_realismAH1 71 hole_2_step_W_realismAH5 72 hole_2_step_W_realismAH5 73 hole_2_step_W_realismAH5 74 hole_2_step_W_realismAH2 75 hole_2_step_W_realismAH6 76 hole_2_step_W_realismAH5 77 hole_2_step_W_realismAH
[R] R Commander - how to disable the alphabetical sorting of variable names?
I try to disable alphabetical sorting of the variable names but I fail, R Commander does not store any changes made in the Commander Options menu / window. I tried to insert options(sort.names = FALSE) in Rprofile.site and .Rprofile config files but without success. Does anyone know the solution? -- View this message in context: http://r.789695.n4.nabble.com/R-Commander-how-to-disable-the-alphabetical-sorting-of-variable-names-tp3175426p3175426.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.views(Spatial) does not seem to be able to find RPyGeo package
The package is stated only to run under Windows (see the SystemRequirements field on its CRAN page), and you are on Linux - does this explain anything? Maybe ask the package maintainer? Roger Linder, Eric wrote: I have this problem with loading RPyGeo package when using update.views. How can I fix this. I have tried to use other CRAN mirrors with the same result. Below is a copy of my session. -session--- R version 2.12.1 (2010-12-16) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i486-pc-linux-gnu (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] library(ctv) update.views('Spatial') --- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done Warning message: In update.views(Spatial) : The following packages are not available: RPyGeo -session--- The information contained in this communication may be C...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Roger Bivand Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway -- View this message in context: http://r.789695.n4.nabble.com/update-views-Spatial-does-not-seem-to-be-able-to-find-RPyGeo-package-tp3174870p3175299.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loop variable names as function arguments
Dear all, is there a way to loop the rp.doublebutton function in the rpanel package? The difficulty I'm having lies with the variable name argument. library(rpanel) if (interactive()) { draw - function(panel) { plot(unlist(panel$V),ylim=0:1) panel } panel - rp.control(V=as.list(rep(.5,3))) rp.doublebutton(panel, var = V[[1]], step = 0.05, action = draw, range = c(0, 1)) rp.doublebutton(panel, var = V[[2]], step = 0.05, action = draw, range = c(0, 1)) rp.doublebutton(panel, var = V[[3]], step = 0.05, action = draw, range = c(0, 1)) } Regards, Philip __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] looking for the RMySQL package for R 2.12.0 under XP
Hello David, As I had no time to try to compile the RMySQL package, I finally followed your advice and moved to RODBC. The decision to modify my scripts was taken after I discovered the function odbcDriverConnect which allow to directly connect to database (like RMySQL) without declaring the database through ODBC window. With the following command, ch-odbcDriverConnect(connection=SERVER=localhost;DRIVER=MySQL ODBC 5.1 Driver;DATABASE=my_db;UID=my_user;PWD=my_pwd;case=tolower) the connection runs nicely. Thanks again for the link. Happy New Year to all the R-users, Ptit Bleu. -- View this message in context: http://r.789695.n4.nabble.com/looking-for-the-RMySQL-package-for-R-2-12-0-under-XP-tp3057537p3175513.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] speed up in R apply
Hi, I am doing some simulations and found a bottle neck in my R script. I made an example: a = matrix(rnorm(500),100,5) tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt [1] -1291.026 Time difference of 0.2354031 secs tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt [1] -1291.026 Time difference of 20.23150 secs Is there a faster way of calculating sum of products (of columns, or of rows)? And is this an expected behavior? Thanks for your advice in advance, Young [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to 'explode' a matrix
Hi everyone, I'm looking for a way to 'explode' a matrix like this: matrix(1:4,2,2) [,1] [,2] [1,]13 [2,]24 into a matrix like this: matrix(c(1,1,2,2,1,1,2,2,3,3,4,4,3,3,4,4),4,4) [,1] [,2] [,3] [,4] [1,]1133 [2,]1133 [3,]2244 [4,]2244 My current kludge is this: v1=rep(1:4,each=2,times=2) v2=v1[order(rep(1:2,each=4,times=2))] matrix(v2,4,4) But I'm hoping there's a more efficient solution that I'm not aware of. Many thanks, Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique limited to 536870912
Hi I am using the 64 bit version. To check that i went in the bin folder and executed file r . It gave the following output ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped The error i got when reading in my edgelist was length 927365385 is too large for hashing This number was the number of entries in the edgelist btw I hope this helps. Thanks On Wed, Jan 5, 2011 at 5:15 AM, jim holtman jholt...@gmail.com wrote: Could it be that you are running on a 32-bit version of R? 536870912 * 4 = 2GB if those were integers which would use up all of memory. You never did show what your error message was or what system you were using. On Wed, Jan 5, 2011 at 12:08 AM, Indrajeet Singh sin...@cs.ucr.edu wrote: Hi I am using R with igraph to analyze an edgelist that is greater than the said amount. Does anyone know a way around this? Thanks Inder __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] vector of character with unequal width
Dear R users, The best in this new year 2011. I am dealing with a character vector (xx) whose nchar are not the same. Ex. nchar(xx) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 4 4 4 4 4 4 4 4 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ... 9 I need xx to be nchar = 9 My best guest was to paste 0's. Then I need substring (xx, 6, 9). I came with: xx[1:61]-paste(, xx[1:61], sep=) xx[62:66]-paste(00, xx[62:66], sep=) xx[67:100]-paste(0, xx[67:100], sep=) .. nchar(xx) [1] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 [38] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 [75] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 xx-substring(xx, 6, 9) This is a solution for one data set would be sufficient but not if I will continuously deal with this same issue. Furthermore, I am trying to automate the process but I have not be able to came with adequate solution. I was thinking to create a character vector of 0's 9-nchar(xx). Then paste it to xx. 9-nchar(xx) [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 [38] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 6 6 6 5 5 5 5 5 5 5 5 [75] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ..1 Nevertheless, I have not been able to create this vector nor I do not know if this is the best option. Another way I thought was to create an if statement, but this will be long and not efficient (I think). Any suggestion, will be appreciated. Jose [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] real time R
Hi, We're using R in an application where asking for a probability of an event takes about 130ms. What could we do to take that down to 30ms-40ms? The query code uses randomforest, knn. -- M. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cost-benefit/value for money analysis
Liviu Try this: require(sos) findFn('cost benefit') found 12 matches Thanks, I wasn't aware of sos, however, following up the hits hasn't moved me any further forward, except to demonstrate that such a function I want doesn't exist. But I will try some other search options. Graham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing fitting models
Dear all, I have 3 models (from simple to complex) and I want to compare them in order to see if they fit equally well or not. From the R prompt I am not able to see where I can get this information. Let´s do an example: fit1- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) #EQUIVALE A lm(response ~ stimulus*condition, data=scrd) fit2- lm(response ~ stimulus + condition, data=scrd) fit3- lm(response ~ condition, data=scrd) anova(fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ stimulus + condition Model 2: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(F) 1165 364.13 2159 362.67 61.4650 0.1071 0.9955 anova(fit3, fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ condition Model 2: response ~ stimulus + condition Model 3: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(F) 1171 382.78 2165 364.13 618.650 1.3628 0.2328 3159 362.67 6 1.465 0.1071 0.9955 How can I understand that the simple model fits as good as the complex model (the one with the interaction)? Thanks in advance All the best [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stop and call objects
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Henrique Dallazuanna Sent: Wednesday, January 05, 2011 9:26 AM To: Sebastien Bihorel Cc: R-help Subject: Re: [R] Stop and call objects Try this: f - function(x) tryCatch(sum(x),error=function(e)sprintf(Error in %s: %s, deparse(sys.call(1)), e$message)) f('a') The argument e to the error handler contains a call component so you don't have to rely on the unreliable sys.call(1) to get the offending call. E.g., f2 - function(x) { tryCatch(sum(x), error=function(e) { sprintf(Error in %s: %s, deparse(e$call)[1], e$message) } ) } f2('char') [1] Error in sum(x): invalid 'type' (character) of argument Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com On Wed, Jan 5, 2011 at 12:23 PM, Sebastien Bihorel sebastien.biho...@cognigencorp.com wrote: Dear R-users, Let's consider the following snippet: f - function(x) tryCatch(sum(x),error=function(e) stop(e)) f('a') As expected, the last call returns an error message: Error in sum(x) : invalid 'type' (character) of argument My questions are the following: 1- can I easily ask the stop function to reference the f function in addition to sum(x) in the error message? 2- If not, I guess I would have to extract the call and message objects from e, coerce the call as a character object, build a custom string, and pass it to the stop function using call.=F. How can I coerce a call object to a character and maintain the aspect of the printed call (i.e. sum(x) instead of the character vector sum x returned by as.character(e$call))? Thank you Sebastien __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'explode' a matrix
Try this: apply(apply(m, 2, rep, each = 2), 1, rep, each = 2) or m[rep(seq(nrow(m)), each = 2), rep(seq(ncol(m)), each = 2)] On Wed, Jan 5, 2011 at 10:03 AM, Kevin Ummel kevinum...@gmail.com wrote: Hi everyone, I'm looking for a way to 'explode' a matrix like this: matrix(1:4,2,2) [,1] [,2] [1,]13 [2,]24 into a matrix like this: matrix(c(1,1,2,2,1,1,2,2,3,3,4,4,3,3,4,4),4,4) [,1] [,2] [,3] [,4] [1,]1133 [2,]1133 [3,]2244 [4,]2244 My current kludge is this: v1=rep(1:4,each=2,times=2) v2=v1[order(rep(1:2,each=4,times=2))] matrix(v2,4,4) But I'm hoping there's a more efficient solution that I'm not aware of. Many thanks, Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] real time R
On 05.01.2011 17:10, Marcelo Barbudas wrote: Hi, We're using R in an application where asking for a probability of an event takes about 130ms. What could we do to take that down to 30ms-40ms? The query code uses randomforest, knn. Use a machine that is 4 times faster? Otherwise: Use another method or a more efficient implementation. Don't use R at all if you want _guaranteed_ real time processing. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'explode' a matrix
Kevin Ummel kevinummel at gmail.com writes: I'm looking for a way to 'explode' a matrix like this: matrix(1:4,2,2) [,1] [,2] [1,]13 [2,]24 This is the Kronecker product of your matrix with the matrix (1 1 ; 1 1) m - matrix(1:4,2,2) kronecker(m,matrix(1,2,2)) cheers Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multipanel plots
On 05.01.2011 06:16, smriti Sebastian wrote: hi, i have attached a doc file. Maybe, but it cannot make it through the list. Is this graph can be plotted using R?Plz help We do not know. Make it available on some webserver and refer to it with an URL. Uwe Ligges regards, smriti __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] real time R
On Wed, Jan 5, 2011 at 4:10 PM, Marcelo Barbudas nos...@gmail.com wrote: Hi, We're using R in an application where asking for a probability of an event takes about 130ms. What could we do to take that down to 30ms-40ms? The query code uses randomforest, knn. That's a fairly vague question So some vague answers: Firstly, profile your query to identify bottlenecks and then concentrate your effort on removing them. Anything else is a waste of time. Secondly, get a faster computer - whether that means faster CPU, faster hard disks, faster RAM depends on where the bottleneck is in your process. Or get parallel and use multiple CPUs. Or rewrite in C. Or machine code. Or do it on a GPU. Thirdly, give us something more specific! Like examples perhaps? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'explode' a matrix
Hi Kevin, Take a look at ?kronecker HTH, Jorge On Wed, Jan 5, 2011 at 7:03 AM, Kevin Ummel wrote: Hi everyone, I'm looking for a way to 'explode' a matrix like this: matrix(1:4,2,2) [,1] [,2] [1,]13 [2,]24 into a matrix like this: matrix(c(1,1,2,2,1,1,2,2,3,3,4,4,3,3,4,4),4,4) [,1] [,2] [,3] [,4] [1,]1133 [2,]1133 [3,]2244 [4,]2244 My current kludge is this: v1=rep(1:4,each=2,times=2) v2=v1[order(rep(1:2,each=4,times=2))] matrix(v2,4,4) But I'm hoping there's a more efficient solution that I'm not aware of. Many thanks, Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation - Natrual Selection
On 05/01/2011 17:40, Bert Gunter wrote: My hypothesis was specified before I did my experiment. Whilst far from perfect, I've tried to do the best I can to assess rise in resistance, without going into genetics as it's not possible. (Although may be at the next institution I've applied for MSc). With my hypothesis (I mentioned it below), I was of the frame of mind that a nonsignificant p-value on the cleaner variable (for now - experiment is far from over), indicated a lack of evidence for rejecting the null. And so at the minute, it looks like the type of cleaner makes no difference. I have no fundamental objection, but be careful. I would simply qualify your last sentence by saying that it means that the experimental noise is to great to precisely determine the size of the cleaner effect. Scientific reality tells us that it is never exactly 0; what your results show is that your uncertainty about the value of the difference encompasses both positive and negative values. This does NOT mean that the difference might not be scientifically large enough to be of interest -- a confidence interval for the difference (MUCH better than a P value) would help you determine that. If the interval is narrow enough that the difference, positive or negative, is too small to be of scientific interest, then you're done. If the linterval is large, then it tells you that you need more data, a better experiment (less noisy) etc. -- Bert At the moment I wouldn't call the confidence interval small, it's definately wide, and at the minute the confidence interval covers zero. My R-squared at the minite is also 0.5, this is mostly due to the few extreme cases of adaptation as I mentioned before, but I'm hesitant to remove it as papers in my literature study which also evolve bacteria show that there is often (sometimes wide) variation in the paths populations take. So whilst mathematically a bit undesirable, and makes me and the model uncertain, it does fall into place with what is known, or has been previously shown of the reality of selection. Again if I include the data from the bacteria dropped from the study, all that improves, and uncertainty is reduced. It may also be worth me mentioning, I am also taking a more traditional approach (by that I mean a more Statistics 101 approach, indeed that is all the stats tuition covered in my course as a taught element), incase what I've described above did not work or was not ideal, because we (me and my supervisor) did forsee a model report may contain a lot of uncertainty. Indeed we did expect some populations to adapt and some to not etc. So I've also been collecting data on the width of the zones of inhibition shown by putting disks of cleaner on plates of growth, and measuring the dead zone that results. I can get lots of data from this with only a few plates, and doing this at the start of the study, a few times in the middle, and at the end. Will allow me to do more traditional analysis, for example t.test on the dead zone widths at the end of the study, between cleaner a and b. Or a non parametric equivalent, maybe even a permutation test. The modelling stuff is already beyond what my supervisor expects of me, but I felt it would add value and a lot more insight to the study, allowing more variables to be accounted for, than a more short-sighted traditional test. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector of character with unequal width
Try this: formatC(c(1, 11, 111, ), flag = 0, width = 9) Or: sprintf(%09d, c(1, 11, 111)) On Wed, Jan 5, 2011 at 1:50 PM, jose Bartolomei surfpr...@hotmail.comwrote: Dear R users, The best in this new year 2011. I am dealing with a character vector (xx) whose nchar are not the same. Ex. nchar(xx) [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 4 4 4 4 4 4 4 4 [75] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 ... 9 I need xx to be nchar = 9 My best guest was to paste 0's. Then I need substring (xx, 6, 9). I came with: xx[1:61]-paste(, xx[1:61], sep=) xx[62:66]-paste(00, xx[62:66], sep=) xx[67:100]-paste(0, xx[67:100], sep=) .. nchar(xx) [1] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 [38] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 [75] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 xx-substring(xx, 6, 9) This is a solution for one data set would be sufficient but not if I will continuously deal with this same issue. Furthermore, I am trying to automate the process but I have not be able to came with adequate solution. I was thinking to create a character vector of 0's 9-nchar(xx). Then paste it to xx. 9-nchar(xx) [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 [38] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 6 6 6 5 5 5 5 5 5 5 5 [75] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ..1 Nevertheless, I have not been able to create this vector nor I do not know if this is the best option. Another way I thought was to create an if statement, but this will be long and not efficient (I think). Any suggestion, will be appreciated. Jose [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vector of character with unequal width
On Wed, Jan 05, 2011 at 03:50:13PM +, jose Bartolomei wrote: [...] I was thinking to create a character vector of 0's 9-nchar(xx). Then paste it to xx. 9-nchar(xx) [1] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 [38] 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 6 6 6 5 5 5 5 5 5 5 5 [75] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ..1 Nevertheless, I have not been able to create this vector nor I do not know if this is the best option. Did you consider something like the following? xx - c(abc, abcd, abcde) z1 - rep(0, times=length(xx)) z2 - substr(z1, 1, 9 - nchar(xx)) yy - paste(z2, xx, sep=) cbind(yy) # yy #[1,] 00abc #[2,] 0abcd #[3,] abcde Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting colour-coded points
Hi, I have a file of the following type: idab 1 0.5 5 2 0.7 15 3 1.6 7 40.5 25 I would like to plot the data in column a on the y-axis and the corresponding data in column id on the x-axis, so plot(a~id). However I would like to colour these points according to the data in column b. column b data may be colour coded into the following bins: 0-9; 10-19; 20-29. Any idea on how to accomplish this? TIA, Anjan -- === anjan purkayastha, phd. research associate fas center for systems biology, harvard university 52 oxford street cambridge ma 02138 phone-703.740.6939 === [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting colour-coded points
Hi Anjan, Try something along the lines of d$bb - with(d, cut(b, c(0,9,19,29))) with(d, plot(a, id, col = bb, pch = 16, las = 1)) legend('topright', as.character(levels(d$bb)), col = 1:length(levels(d$bb)), ncol = 3, pch = 16) where 'd' is your original data.frame. HTH, Jorge On Wed, Jan 5, 2011 at 2:00 PM, ANJAN PURKAYASTHA wrote: Hi, I have a file of the following type: idab 1 0.5 5 2 0.7 15 3 1.6 7 40.5 25 I would like to plot the data in column a on the y-axis and the corresponding data in column id on the x-axis, so plot(a~id). However I would like to colour these points according to the data in column b. column b data may be colour coded into the following bins: 0-9; 10-19; 20-29. Any idea on how to accomplish this? TIA, Anjan -- === anjan purkayastha, phd. research associate fas center for systems biology, harvard university 52 oxford street cambridge ma 02138 phone-703.740.6939 === [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up in R apply
On Jan 5, 2011, at 10:03 AM, Young Cho wrote: Hi, I am doing some simulations and found a bottle neck in my R script. I made an example: a = matrix(rnorm(500),100,5) tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt [1] -1291.026 Time difference of 0.2354031 secs tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt [1] -1291.026 Time difference of 20.23150 secs Is there a faster way of calculating sum of products (of columns, or of rows)? You should look at crossprod and tcrossprod. And is this an expected behavior? Yes. For loops and *apply strategies are slower than the proper use of vectorized functions. Thanks for your advice in advance, -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: Reprinting of Bertin's Semiology of Graphics
This is a major publishing event for statistical graphics. I have long possessed Bertin's shorter book Graphics and Graphic Information Processing but Semiology is the one I've been waiting for. Thanks for the good news Michael! Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/OT-Reprinting-of-Bertin-s-Semiology-of-Graphics-tp3175859p3176233.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting colour-coded points
On Jan 5, 2011, at 2:00 PM, ANJAN PURKAYASTHA wrote: Hi, I have a file of the following type: idab 1 0.5 5 2 0.7 15 3 1.6 7 40.5 25 I would like to plot the data in column a on the y-axis and the corresponding data in column id on the x-axis, so plot(a~id). However I would like to colour these points according to the data in column b. column b data may be colour coded into the following bins: 0-9; 10-19; 20-29. Any idea on how to accomplish this? Something along the lines of this code: plot(a ~ id, data=dfrm, col=c(red, green, blue)[findInterval(dfrm$b, c(0,10,20,30) )] ) -- David. TIA, Anjan -- === anjan purkayastha, phd. research associate fas center for systems biology, harvard university 52 oxford street cambridge ma 02138 phone-703.740.6939 === [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Commander - how to disable the alphabetical sorting of variable names?
Dear Iurie Malai, How Rcmdr options are set is described in ?Commander, which is also accessible via the R Commander menus, Help - Commander help. You need options(Rcmdr=list(sort.names=FALSE)) which you can put in Rprofile.site. Best, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Iurie Malai Sent: January-05-11 7:39 AM To: r-help@r-project.org Subject: [R] R Commander - how to disable the alphabetical sorting of variable names? I try to disable alphabetical sorting of the variable names but I fail, R Commander does not store any changes made in the Commander Options menu / window. I tried to insert options(sort.names = FALSE) in Rprofile.site and .Rprofile config files but without success. Does anyone know the solution? -- View this message in context: http://r.789695.n4.nabble.com/R-Commander-how- to-disable-the-alphabetical-sorting-of-variable-names-tp3175426p3175426.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up in R apply
On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius dwinsem...@comcast.net wrote: On Jan 5, 2011, at 10:03 AM, Young Cho wrote: Hi, I am doing some simulations and found a bottle neck in my R script. I made an example: a = matrix(rnorm(500),100,5) tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt [1] -1291.026 Time difference of 0.2354031 secs tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt [1] -1291.026 Time difference of 20.23150 secs Is there a faster way of calculating sum of products (of columns, or of rows)? You should look at crossprod and tcrossprod. Hmm. Not sure that would help, David. You could use a matrix multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but of course you could also use rowSums to get those. And is this an expected behavior? Yes. For loops and *apply strategies are slower than the proper use of vectorized functions. To expand a bit on David's point, the apply function isn't magic. It essentially loops over the rows, in this case. By multiplying columns together you are performing the looping over the rows in compiled code, which is much, much faster. If you want to do this kind of operation effectively in R for a general matrix (i.e. not knowing in advance that it has exactly 5 columns) you could use Reduce a - matrix(rnorm(500),100,5) system.time(pr1 - a[,1]*a[,2]*a[,3]*a[,4]*a[,5]) user system elapsed 0.150.090.37 system.time(pr2 - apply(a, 1, prod)) user system elapsed 22.090 0.140 22.902 all.equal(pr1, pr2) [1] TRUE system.time(pr3 - Reduce(get(*), as.data.frame(a), rep(1, nrow(a user system elapsed 0.410 0.010 0.575 all.equal(pr3, pr2) [1] TRUE Thanks for your advice in advance, -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assumptions for ANOVA: the right way to check the normality
Someone suggested me that I don´t have to check the normality of the data, but the normality of the residuals I get after the fitting of the linear model. I really ask you to help me to understand this point as I don´t find enough material online where to solve it. Try the following: # using your scrd data and your proposed models fit1- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) fit2- lm(response ~ stimulus + condition, data=scrd) fit3- lm(response ~ condition, data=scrd) # Set up for 6 plots on 1 panel op = par(mfrow=c(2,3)) # residuals function extracts residuals # Visual inspection is a good start for checking normality # You get a much better feel than from some magic number statistic hist(residuals(fit1)) hist(residuals(fit2)) hist(residuals(fit3)) # especially qqnorm() plots which are linear for normal data qqnorm(residuals(fit1)) qqnorm(residuals(fit2)) qqnorm(residuals(fit3)) # Restore plot parameters par(op) If the data are not normally distributed I have to use the kruskal wallys test and not the ANOVA...so please help me to understand. Indeed - Kruskal-Wallis is a good test to use for one factor data that is ordinal so it is a good alternative to your fit3. Your response seems to be a discrete variable rather than a continuous variable. You must decide if it is reasonable to approximate it with a normal distribution which is by definition continuous. I make a numerical example, could you please tell me if the data in this table are normally distributed or not? Help! number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH7 26flat_550_W_realismAH2 27flat_550_W_realismAH7 28flat_550_W_realismAH5 29flat_550_W_realismAH5 30 bump_2_step_W_realism A1 31 bump_2_step_W_realism A3 32 bump_2_step_W_realism A5 33 bump_2_step_W_realism A1 34 bump_2_step_W_realism A3 35 bump_2_step_W_realism A2 36 bump_2_step_W_realism A5 37 bump_2_step_W_realism A4 38 bump_2_step_W_realism A4 39 bump_2_step_W_realism A4 40 bump_2_step_W_realism A4 41 bump_2_step_W_realismAH3 42 bump_2_step_W_realismAH5 43 bump_2_step_W_realismAH1 44 bump_2_step_W_realismAH5 45 bump_2_step_W_realismAH4 46 bump_2_step_W_realismAH4 47 bump_2_step_W_realismAH5 48 bump_2_step_W_realismAH4 49 bump_2_step_W_realismAH3 50 bump_2_step_W_realismAH4 51 bump_2_step_W_realismAH5 52 bump_2_step_W_realismAH4 53 hole_2_step_W_realism A3 54 hole_2_step_W_realism A3 55 hole_2_step_W_realism A4 56 hole_2_step_W_realism A1 57 hole_2_step_W_realism A4 58 hole_2_step_W_realism A3 59 hole_2_step_W_realism A5 60 hole_2_step_W_realism A4 61 hole_2_step_W_realism A3 62 hole_2_step_W_realism A4 63 hole_2_step_W_realism A7 64 hole_2_step_W_realism A5 65 hole_2_step_W_realism A1 66
[R] Nnet and AIC: selection of a parsimonious parameterisation
Hi All, I am trying to use a neural network for my work, but I am not sure about my approach to select a parsimonious model. In R with nnet, the IAC has not been defined for a feed-forward neural network with a single hidden layer. Is this because it does not make sens mathematically in this case? For example, is this pseudo code sensible? Thanks in advance for your help. I am sorry if this has been answered before, but I haven't found an answer for this in the archive. Below, I have added an implementation of this idea based on (Modern Applied Statistic with S) MASS code of chapter 8. Cheers, Ben Pseudo code Define RSS as: RSS = (1-alpha)*RSS(identification set) + alpha* RSS(validation set) and AIC as: AIC = 2*np + N*log(RSS) where np corresponds to the non-null parameters of the neural network and N is the sample size (based on http://en.wikipedia.org/wiki/Akaike_information_criterion). Assuming a feed-forward neural network with a single hidden layer and a maximum number of neurons (maxSize), For size = 1 to maxSize Optimise the decay Select the neural network with the smallest AIC for a given size and decay using random starting parameterisation and random identification set For the lowest to the largest diagonal element of the Hessian, Equate the corresponding parameter to 0 If AIC(i)AIC(i-1), break; The neural network selected is the one with the smallest AIC. an example based on cpus data in Chapter 8 of MASS library(nnet) library(MASS) # From Chapter 6, for comparisons set.seed(123) cpus.samp - c(3, 5, 6, 7, 8, 10, 11, 16, 20, 21, 22, 23, 24, 25, 29, 33, 39, 41, 44, 45, 46, 49, 57, 58, 62, 63, 65, 66, 68, 69, 73, 74, 75, 76, 78, 83, 86, 88, 98, 99, 100, 103, 107, 110, 112, 113, 115, 118, 119, 120, 122, 124, 125, 126, 127, 132, 136, 141, 144, 146, 147, 148, 149, 150, 151, 152, 154, 156, 157, 158, 159, 160, 161, 163, 166, 167, 169, 170, 173, 174, 175, 176, 177, 183, 184, 187, 188, 189, 194, 195, 196, 197, 198, 199, 202, 204, 205, 206, 208, 209) cpus2 - cpus[, 2:8] # excludes names, authors’ predictions attach(cpus2) cpus3 - data.frame(syct = syct-2, mmin = mmin-3, mmax = mmax-4, cach=cach/256,chmin=chmin/100, chmax=chmax/100, perf) detach() CVnn.cpus - function(formula, data = cpus3[cpus.samp, ], maxSize = 10, decayRange = c(0,0.2), nreps = 5, nifold = 10, alpha= 9/10, linout = TRUE, skip = TRUE, maxit = 1000,...){ #nreps=number of attempts to fit a nnet model with randomly chosen initial parameters # The one with the smallest RSS on the training data is then chosen nnWtsPrunning -function(nn,data,alpha,i){ truth - log10(data$perf) RSS=(1-alpha)*sum((truth[ri != i] - predict(nn, data[ri != i,]))^2) + alpha* sum((truth[ri == i] - predict(nn, data[ri == i,]))^2) AIC=2*sum(nn$wts!=0) + length(data$perf)*log(RSS) nn.tmp=nn for (j in (1:length(nn$wts))) { nn.tmp$wts[order(diag(nn.tmp$Hessian))[j]]=0 RSS.tmp=(1-alpha)*sum((truth[ri != i] - predict(nn.tmp, data[ri != i,]))^2) + alpha* sum((truth[ri == i] - predict(nn.tmp, data[ri == i,]))^2) AIC.tmp=2*sum(nn.tmp$wts!=0) + length(data$perf)*log(RSS.tmp) if (is.nan(AIC.tmp) || AIC.tmpAIC ) { cat('\n j',j,'AIC'=AIC.tmp,'AIC_1',AIC,'\n') break } else { nn=nn.tmp; AIC=AIC.tmp; RSS=RSS.tmp } } list(choice=sqrt(RSS/100),nparam=sum(nn$wts!=0),AIC=AIC,nn=nn) } #Modified function for optimisation CVnn1 - function(decay, formula, data, nreps=1, ri, size, linout, skip, maxit, optimFlag=FALSE, alpha) { truth - log10(data$perf) nn - nnet(formula, data[ri !=1,], trace=FALSE, size=size, linout=linout, skip=skip, maxit=maxit, Hess = TRUE) RSS=(alpha-1)*sum((truth[ri != 1] - predict(nn, data[ri != 1,]))^2) + alpha* sum((truth[ri == 1] - predict(nn, data[ri == 1,]))^2) ii=1 for (i in sort(unique(ri))) { for(rep in 1:nreps) { nn.tmp - nnet(formula, data[ri !=i,], trace=FALSE, size=size, linout=linout, skip=skip, maxit=maxit, Hess = TRUE) RSS.tmp=(alpha-1)*sum((truth[ri != i] - predict(nn.tmp, data[ri != i,]))^2) + alpha* sum((truth[ri == i] - predict(nn.tmp, data[ri == i,]))^2) if (RSS.tmpRSS){ RSS=RSS.tmp; nn=nn.tmp; ii=i} } } if (optimFlag) { return(RSS) }else{ prn=nnWtsPrunning(nn,data,alpha,ii) list(choice=prn$choice,nparam=prn$nparam,nparaminit=length(nn$wts),AIC=prn$AIC,nn1=prn$nn) } } maxSize=maxSize+1; j=1; choice - numeric(maxSize); nparam - numeric(maxSize); lambdaj - numeric(maxSize) AIC - numeric(maxSize); nparamInit -
[R] plot(aModel) vs. influence.measures()
A while back I asked about getting a list of points that R considers influential after fitting a linear model, and very quickly got a helpful pointer to influence.measures(). But it has happened again. The trouble I am having is that points marked on plots are not flagged in the output from influence.measures(), and I can't read them on the plots. I tried some successive deletion, but then other points (naturally) start to look troublesome). Is there a good way to get a list of suspicious entries at the beginning? In this case, I am trying to help identify possible data entry errors, and I am interested in knowing what R bothered to mark up front. Perhaps the defaults should be telling me that what I want to do is silly, but it sure _seems_ like it would be helpful. Is there a way to control the threshold used by influence.measures() to get it to flag more items at one time? I am learning the hard way, so feel free to tell me that I should be trying to do this some other way. Bill __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] OT: Reducing pdf file size
Greetings Does anyone have any suggestions for reducing pdf file size, particularly pdfs containing photos, without sacrificing quality? Thanks for any tips in advance. Cheers Kurt *** Kurt Lewis Helf, Ph.D. Ecologist EEO Counselor National Park Service Cumberland Piedmont Network P.O. Box 8 Mammoth Cave, KY 42259 Ph: 270-758-2163 Lab: 270-758-2151 Fax: 270-758-2609 Science, in constantly seeking real explanations, reveals the true majesty of our world in all its complexity. -Richard Dawkins The scientific tradition is distinguished from the pre-scientific tradition in having two layers. Like the latter it passes on its theories but it also passes on a critical attitude towards them. The theories are passed on not as dogmas but rather with the challenge to discuss them and improve upon them. -Karl Popper ...consider yourself a guest in the home of other creatures as significant as yourself. -Wayside at Wilderness Threshold in McKittrick Canyon, Guadalupe Mountains National Park, TX Cumberland Piedmont Network (CUPN) Homepage: http://tiny.cc/e7cdx CUPN Forest Pest Monitoring Website: http://bit.ly/9rhUZQ CUPN Cave Cricket Monitoring Website: http://tiny.cc/ntcql CUPN Cave Aquatic Biota Monitoring Website: http://tiny.cc/n2z1o __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integration Sweave and TexMakerX
On 05/01/2011 12:04 PM, Sebastián Daza wrote: Hi, Does anyone know how to integrate texmakerx and sweave on Windows? I mean, to run .rnw files directly from texmakerx and get a pdf or dvi file. I don't know texmakerx, but the patchDVI package (on R-forge, see https://r-forge.r-project.org/R/?group_id=233) contains some functions for hooking up Sweave with other LaTeX editors. If it's not flexible enough to handle yours I'd like to hear what's missing, and I'd probably add it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prediction error for Ordinary Kriging
Hi ALL, Can you please help me on how to determine the prediction error for ordinary kriging?Below are all the commands i used to generate the OK plot: rsa2 - readShapeSpatial(residentialsa, CRS(+proj=tmerc +lat_0=39.66 +lon_0=-8.1319062 +k=1 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs)) x2 - readShapeSpatial(ptna2, CRS(+proj=tmerc +lat_0=39.66 +lon_0=-8.1319062 +k=1 +x_0=0 +y_0=0 +ellps=intl +units=m +no_defs)) bb - bbox(rsa2) cs - c(1, 1) cc - bb[, 1] + (cs/2) cd - ceiling(diff(t(bb))/cs) rsa2_grd - GridTopology(cellcentre.offset = cc,cellsize = cs, cells.dim = cd) getClass(SpatialGrid) p4s - CRS(proj4string(rsa2)) x2_SG - SpatialGrid(rsa2_grd, proj4string = p4s) x2_SP - SpatialPoints(cbind(x2$X, x2$Y)) v - variogram(log1p(tsport_ace) ~ 1, x2, cutoff=100, width=9) te- fit.variogram(v,vgm(0.0437, Exp, 26, 0)) y - krige(tsport_ace~1, x2, x2_SG, model = ve.fit) spplot(y, 1, col.regions = bpy.colors(100), sp.layout = list(sp.lines,as(rsa2, SpatialLines),no.clip = TRUE)) I'm looking forward to your response. Thanks. Best regards, Pearl dela Cruz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speed up in R apply
On Jan 5, 2011, at 2:40 PM, Douglas Bates wrote: On Wed, Jan 5, 2011 at 1:22 PM, David Winsemius dwinsem...@comcast.net wrote: On Jan 5, 2011, at 10:03 AM, Young Cho wrote: Hi, I am doing some simulations and found a bottle neck in my R script. I made an example: a = matrix(rnorm(500),100,5) tt = Sys.time(); sum(a[,1]*a[,2]*a[,3]*a[,4]*a[,5]); Sys.time() - tt [1] -1291.026 Time difference of 0.2354031 secs tt = Sys.time(); sum(apply(a,1,prod)); Sys.time() - tt [1] -1291.026 Time difference of 20.23150 secs Is there a faster way of calculating sum of products (of columns, or of rows)? You should look at crossprod and tcrossprod. Hmm. Not sure that would help, David. You could use a matrix multiplication of a %*% rep(1, ncol(a)) if you wanted the row sums but of course you could also use rowSums to get those. Thanks for pointing that out. I misread the OP's code. And is this an expected behavior? Yes. For loops and *apply strategies are slower than the proper use of vectorized functions. To expand a bit on David's point, the apply function isn't magic. It essentially loops over the rows, in this case. By multiplying columns together you are performing the looping over the rows in compiled code, which is much, much faster. If you want to do this kind of operation effectively in R for a general matrix (i.e. not knowing in advance that it has exactly 5 columns) you could use Reduce a - matrix(rnorm(500),100,5) system.time(pr1 - a[,1]*a[,2]*a[,3]*a[,4]*a[,5]) user system elapsed 0.150.090.37 system.time(pr2 - apply(a, 1, prod)) user system elapsed 22.090 0.140 22.902 all.equal(pr1, pr2) [1] TRUE system.time(pr3 - Reduce(get(*), as.data.frame(a), rep(1, nrow(a Slightly faster would be: system.time(pr3 - Reduce(*, as.data.frame(a))) And thanks for the nice example. Using a data.frame to feed Reduce materially enhances its value to me. user system elapsed 0.410 0.010 0.575 all.equal(pr3, pr2) [1] TRUE -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Advice on obscuring unique IDs in R
Dear colleagues, This may be a question with a really obvious answer, but I can't find it. I have access to a large file with real medical record identifiers (mixed strings of characters and numbers) in it. These represent medical events for many thousands of people. It's important to be able to link events for the same people. It's much more important that the real record numbers are strongly obscured. I'm interested in some kind of strong one-way hash function to which I can feed the real numbers and get back unique codes for each record identifier fed in. I can do this on the health service system, and I have to do this before making further use of the data! There is the 'digest' function, in the digest package, but this seems to work on the whole vector of IDs, producing, in my case, a vector with 60,000 identical entries. H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5') I could do this in Perl, but I'd have to do quite a bit of work to get it installed. Any quick suggestions? Anthony Staines -- Anthony Staines, Professor of Health Systems Research, School of Nursing, Dublin City University, Dublin 9,Ireland. Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing fitting models
Just do anova(fit3, fit1) This compares those 2 models directly. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Frodo Jedi Sent: Wednesday, January 05, 2011 10:10 AM To: r-help@r-project.org Subject: [R] Comparing fitting models Dear all, I have 3 models (from simple to complex) and I want to compare them in order to see if they fit equally well or not. From the R prompt I am not able to see where I can get this information. Let´s do an example: fit1- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) #EQUIVALE A lm(response ~ stimulus*condition, data=scrd) fit2- lm(response ~ stimulus + condition, data=scrd) fit3- lm(response ~ condition, data=scrd) anova(fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ stimulus + condition Model 2: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(F) 1165 364.13 2159 362.67 61.4650 0.1071 0.9955 anova(fit3, fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ condition Model 2: response ~ stimulus + condition Model 3: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(F) 1171 382.78 2165 364.13 618.650 1.3628 0.2328 3159 362.67 6 1.465 0.1071 0.9955 How can I understand that the simple model fits as good as the complex model (the one with the interaction)? Thanks in advance All the best [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading large SAS dataset in R
Hi all, I have a large (approx. 1 GB) SAS dataset (test.sas7bdat) located in the server (R:/ directory). I have SAS 9.1 installed in my PC and I can read the SAS dataset in SAS, under a windows environment, after assigning libname in R:\ directory. Now I am trying to read the SAS dataset in R (R 2.12.0) using the read.ssd function of the foreign package, but I get an error message SAS failed. I believe I have specified the paths correctly (after reading some previous posts I made sure that I do it right). Below is the small code: sashome- C:/Program Files/SAS/SAS 9.1 read.ssd(libname=R:/, sectionnames=test, sascmd=file.path(sashome, sas.exe)) Please let me know where I am making the mistake. Is it because of the size of the file or the location of the file (in server instead of local hard drive)? Thanks in advance, Santanu -- Santanu Pramanik Survey Statistician NORC at the University of Chicago Bethesda, MD [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to 'explode' a matrix
Thanks, Henrique. The second option you suggested is about twice as fast as my original application. Much appreciated, Kevin On Jan 5, 2011, at 6:30 PM, Henrique Dallazuanna wrote: Try this: apply(apply(m, 2, rep, each = 2), 1, rep, each = 2) or m[rep(seq(nrow(m)), each = 2), rep(seq(ncol(m)), each = 2)] On Wed, Jan 5, 2011 at 10:03 AM, Kevin Ummel kevinum...@gmail.com wrote: Hi everyone, I'm looking for a way to 'explode' a matrix like this: matrix(1:4,2,2) [,1] [,2] [1,]13 [2,]24 into a matrix like this: matrix(c(1,1,2,2,1,1,2,2,3,3,4,4,3,3,4,4),4,4) [,1] [,2] [,3] [,4] [1,]1133 [2,]1133 [3,]2244 [4,]2244 My current kludge is this: v1=rep(1:4,each=2,times=2) v2=v1[order(rep(1:2,each=4,times=2))] matrix(v2,4,4) But I'm hoping there's a more efficient solution that I'm not aware of. Many thanks, Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Heat map in R
Hello, I am trying to make a heatmap in R and am having some trouble. I am very new to the world of R, but have been told that what I am trying to do should be possible. I want to make a heat map that looks like a gene expression heatmap (see http://en.wikipedia.org/wiki/Heat_map). I have 43 samples and 900 genes (yes I know this will be a huge map). I also have copy numbers associated with each gene/sample and need these to be represented as the colour intensities on the heat map. There are multiple genes per sample with different copy numbers. I think my trouble may be how I am setting up my data frame. My data frame was created in excel as a tab deliminated text file: Gene Copy Number Sample ID A 1935 01 B 2057 01 C 2184 02 D 1498 03 E 2294 03 F 2485 03 G 1560 04 H 3759 04 I 2792 05 J 7081 05 K 1922 06 ... ... ... ZZZ 1354 43 My code in R is something like this: data-read.table(/Users/jsmt/desktop/test.txt,header=T) data_matrix-data.matrix(data) data_heatmap - heatmap(data_matrix, Rowv=NA, Colv=NA, col = cm.colors(256), scale=column, margins=c(5,10)) I end up getting a heat map split into 3 columns: sample, depth, gene and the colours are just in big blocks that don't mean anything. Can anyone help me with my dataframe or my R code? Again, I am fairly new to R, so if you can help, please give me very detailed help :) Thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Heat-map-in-R-tp3176478p3176478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Match numeric vector against rows in a matrix?
Two posts in one day is not a good day...and this question seems like it should have an obvious answer: I have a matrix where rows are unique combinations of 1's and 0's: combs=as.matrix(expand.grid(c(0,1),c(0,1))) combs Var1 Var2 [1,]00 [2,]10 [3,]01 [4,]11 I want a single function that will give the row index containing an exact match with vector x: x=c(0,1) The solution needs to be applied many times, so I need something quick -- I was hoping a base function would do it, but I'm drawing a blank. Thanks! Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on obscuring unique IDs in R
Dr. Anthony wrote on 01/05/2011 01:19:49 PM: This may be a question with a really obvious answer, but I can't find it. I have access to a large file with real medical record identifiers (mixed strings of characters and numbers) in it. ... It's not that trivial of a question, or more organizations would have gotten it right. I bet a method (or two) for obscuring PII is recommended by your university or department. When that method has been determined, the requisite R package will probably be easy to find, and down the road you'll dodge the bullet of I thought it would work by not guessing at a method. cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.c...@epa.gov 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting Fortran or C++ etc to R
Thanks Barry and thanks to others who applied off-list. I can see that I should have given more details about my motives for wanting to replace a Fortran program by an R one. At this stage I want to get something working in pure R because it is easier to fool around with and tweak with than Fortran and I have a few things that I want to try out that will involve perturbing the original code and I think I'd rather be doing them in R than in a 3GL. Now that I have publicly asked the question I find that the answer to it occurs to me: The program that I want to port to R is an ML estimation by the EM algorithm. The iterative steps are fairly simple except they need to be repeated a large number of times. What I have noticed is that I can replace (maybe) the within-step loops by matrix multiplications. This means that I will, by using %*%, be effectively handing a lot of the work to external Fortran (or similar) routines without calling .Fortran(). OK, I know that you can see though me and I accept that I am just rationalising my reluctance to get into package-writing. I will bite the bullet on that in due course but for the meantime I'm just going to fool around with straight R. Barry came closest to answering my real question and I will formulate a follow-up question as follows: Does anyone know of a helpful set of examples of the vectorization of code? Cheers, Murray On 6/01/2011 12:32 a.m., Barry Rowlingson wrote: On Wed, Jan 5, 2011 at 7:33 AM, lcnlcn...@gmail.com wrote: As for your actual requirement to do the convertion, I guess there'd not exist any quick ways. You have to be both familiar with R and the other language to make the rewrite work. To make the rewrite work _well_ is the bigger problem! The easiest way to big performance wins is going to be spotting vectorisation possibilities in the Fortran code. Any time you see a DO K=1,N loop then look to see if its just a single vector operation in R. Another way to big wins is to write test code, so you can check if your R code gives the same results as the Fortran (C/C++) code at every stage of the rewrite. Don't just write it all in one go and then hope it works! Small steps Barry -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nzFax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441 Mobile 021 0200 8350 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.