Re: [R] read.xport
How about avoiding SAS XPORT altogether and exporting everything in the simple, clean, non-proprietary, extremely reliable, platform-independent ... etc text format (CSV, tab delimited etc)? -Original Message- From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED] Sent: Thursday, July 14, 2005 10:31 AM To: r-help@stat.math.ethz.ch Subject: [R] read.xport I am trying to import data from a SAS XPORT file that contains 24 SAS files. When I use the read.xport procedure only about 16 data frames (components) are created. Any suggestions? ** *** Gary A. Nelson, Ph.D Massachusetts Division of Marine Fisheries 30 Emerson Avenue Gloucester, MA 01930 Phone: (978) 282-0308 x114 Fax: (617) 727-3337 Email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is it possible to create highly customized report in *.xls format by using R/S+?
So your conclusion is that the only choice is to make mistakes and get in trouble. (That's what Excel excels at.) Two options I haven't seen mentioned are: 1. Create your deliverables in HTML format, and change the extension from .htm to .xls; Excel will import them automatically. The way the file looks in Excel is determined by .CSS settings (I've seen this happen) and I presume HTML tags. 2. For the real spreadsheet thing, switch to OpenOffice.org. Their format is XML compressed with ZIP which you can easily work with since the format specifications are not proprietary. See http://xml.openoffice.org/ for details. -Original Message- From: Wensui Liu [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 20, 2005 10:56 AM To: Greg Snow Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Is it possible to create highly customized report in *.xls format by using R/S+? I appreciate your reply and understand your point completely. But at times we can't change the rule, the only choice is to follow the rule. Most deliverables in my work are in excel format. On 7/20/05, Greg Snow [EMAIL PROTECTED] wrote: See: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html and http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf Greg Snow, Ph.D. Statistical Data Center, LDS Hospital Intermountain Health Care [EMAIL PROTECTED] (801) 408-8111 Wensui Liu [EMAIL PROTECTED] 07/19/05 03:22PM I remember in one slide of Prof. Ripley's presentation overhead, he said the most popular data analysis software is excel. So is there any resource or tutorial on this topic? Thank you so much! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- WenSui Liu, MS MA Senior Decision Support Analyst Division of Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Rprof fails in combination with RMySQL
I think you're barking up the wrong tree. Optimize the MySQL code separately from optimizing the R code. A very nice reference about the former is http://highperformancemysql.com/. Also, if possible, do everything in MySQL. hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL PROTECTED] Sent: Thursday, July 21, 2005 10:11 AM To: Rhelp (E-mail) Subject: [R] Rprof fails in combination with RMySQL Dear R community, I tried to optimized my R code by using Rprof. In my R code I'm using MySQL database connections intensively. After a bunch of queries R fails with the following error message: Error in .Call(RS_MySQL_newConnection, drvId, con.params, groups, PACKAGE = .MySQLPkgName) : RS-DBI driver: (could not connect [EMAIL PROTECTED] on dbname myDB Without the R profiler this code runs very stable since weeks. Do you have any ideas or suggestions? I tried the following R versions: ___ platform i386-pc-solaris2.8 arch i386 os solaris2.8 system i386, solaris2.8 status major1 minor9.1 year 2004 month06 day 21 language R ___ platform sparc-sun-solaris2.8 arch sparc os solaris2.8 system sparc, solaris2.8 status major2 minor1.1 year 2005 month06 day 20 language R ___ platform sparc-sun-solaris2.8 arch sparc os solaris2.8 system sparc, solaris2.8 status major1 minor9.1 year 2004 month06 day 21 language R Thank you in advance and kind regards, Lutz Thieme AMD Saxony/ Product Engineering AMD Saxony Limited Liability Company Co. KG phone: + 49-351-277-4269 M/S E22-PE, Wilschdorfer Landstr. 101 fax: + 49-351-277-9-4269 D-01109 Dresden, Germany [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Rprof fails in combination with RMySQL
I think the opposite is applicable too - optimize R outside of MySQL. Exclude the MySQL queries completely and use instead the same data frames (prepared beforehand) with Rprof. Then, if you really want to run the full code with Rprof, wrap the queries in try(): data - try(fetch(dbSendQuery(connection,query),n=-1)) if (class(data) == try-error) for (i in 1:100) { data - try(fetch(dbSendQuery(connection,query),n=-1)) if (class(data) != try-error) break } Also, why do you close the connection after each query? Open one connection and use it for the whole R session. (I never close the connection after a query.) hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL PROTECTED] Sent: Friday, July 22, 2005 2:04 AM To: bogdan romocea Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Rprof fails in combination with RMySQL Hello Bogdan, thanks for you reply. My MySQL is always optimized oustide from R (but many thanks for the interesting link!). I'm very sure that I have to optimize the R code which uses the data from my queries for calculations. To get more in- formation which R function is the main speed limiter I tried Rprof. Because I'm always opening and closing the connection for every query I have never opened more than one connection. And again: The same R code runs without Rprof stable since weeks multiple times a day. I can exclude by 99% that the error comes from the database. Maybe it comes from large number of opening closing cycles?... Regards, Lutz -Original Message- From: bogdan romocea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 21, 2005 5:05 PM To: Thieme, Lutz Cc: R-help@stat.math.ethz.ch Subject: RE: [R] Rprof fails in combination with RMySQL I think you're barking up the wrong tree. Optimize the MySQL code separately from optimizing the R code. A very nice reference about the former is http://highperformancemysql.com/. Also, if possible, do everything in MySQL. hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL PROTECTED] Sent: Thursday, July 21, 2005 10:11 AM To: Rhelp (E-mail) Subject: [R] Rprof fails in combination with RMySQL Dear R community, I tried to optimized my R code by using Rprof. In my R code I'm using MySQL database connections intensively. After a bunch of queries R fails with the following error message: Error in .Call(RS_MySQL_newConnection, drvId, con.params, groups, PACKAGE = .MySQLPkgName) : RS-DBI driver: (could not connect [EMAIL PROTECTED] on dbname myDB Without the R profiler this code runs very stable since weeks. Do you have any ideas or suggestions? I tried the following R versions: ___ platform i386-pc-solaris2.8 arch i386 os solaris2.8 system i386, solaris2.8 status major1 minor9.1 year 2004 month06 day 21 language R ___ platform sparc-sun-solaris2.8 arch sparc os solaris2.8 system sparc, solaris2.8 status major2 minor1.1 year 2005 month06 day 20 language R ___ platform sparc-sun-solaris2.8 arch sparc os solaris2.8 system sparc, solaris2.8 status major1 minor9.1 year 2004 month06 day 21 language R Thank you in advance and kind regards, Lutz Thieme AMD Saxony/ Product Engineering AMD Saxony Limited Liability Company Co. KG phone: + 49-351-277-4269 M/S E22-PE, Wilschdorfer Landstr. 101 fax: + 49-351-277-9-4269 D-01109 Dresden, Germany [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] choose between dates and times
If happenat is not a datetime value, convert it with strptime(). Then, one solution is to transform it in the following way: num.time - as.numeric(format(happenat,%Y%m%d%H%M%S)) This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset your data frame with dfr[which(num.time = 20050725153000 num.time = 20050726123000),] hth, b. -Original Message- From: Kerry Bush [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 26, 2005 3:54 PM To: r-help@stat.math.ethz.ch Subject: [R] choose between dates and times Dear R-helpers, I have the following data: yhappenat x 5185 (07/22/05 00:05:14) 14 5186 (07/22/05 00:15:14) 14 5187 (07/22/05 00:25:14) 14 5188 (07/22/05 00:35:14) 14 .. I want to choose between 07/25/05 15:30:00 and 07/26/05 12:30:00. Anybody had experience in handling this kind of data? Is there a simple way to subset by the variable 'happenat'? Thanks. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to hiding code for a package
There's something else you could try - since you can't hide the code, obfuscate it. Hide the real thing in a large pile of useless, complicated, awfully formatted code that would stop anyone except the most desperate (including yourself, after a couple of weeks/months) from trying to understand it. The best solution would be to compile the code, but R is not there yet. -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Saturday, July 30, 2005 5:35 AM To: Gary Wong Cc: r-help@stat.math.ethz.ch Subject: Re: [R] How to hiding code for a package What you ask is impossible. For a function to be callable it has to be locatable and hence can be printed. One possibility is to have a namespace, and something like foo - function(...) foo_internal(...) where foo is exported but foo_internal is not. Then foo_internal is hidden from casual inspection, but it can be listed by cognescenti. Why do you want to do this? Anyhone can read the source code of your package, and any function which can be called can be deparsed, possibly after jumping through a few hoops. On Sat, 30 Jul 2005, Gary Wong wrote: Hey everyone, I have made a package and wish to release it but before then I have a problem. I have a few functions in this package written in R that I wish to hide such that after installation, someone can use say the function foo(parameters = ) but cannot do foo. Typing foo should not show the source code or at least not all of it. Is there a way to do this ? I have searched the mailing list and used google, and have found something like [R] Hiding internal package functions for the doc. pkg-internal.Rd but this seems different since it seems that the keyword internal just hides the function from showing in the index and hides documentation, not the function itself. Can someone help? Thanks -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] date format
You need the day to convert to a date format. Assuming day=15: x.date - as.Date(paste(as.character(x),-15,sep=),format=%Y-%m-%d) -Original Message- From: alessandro carletti [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 10, 2005 9:37 AM To: rHELP Subject: [R] date format Hi, I have a problem with a vector (x) containing dates in format -mm (I'm working with monthly means): how can I convert it in date format, so that I can plot it recognising trends for my variables? class(x) says: factor Thanks Alex __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Concerning reading of SAS-files
The first one is an index, not a data set. Anyway, just use SAS to export the data sets in text format (CSV, tab-delimited etc). You can then easily read those in R. (By the way, the help for read.xport says that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT file? Hint: no.) -Original Message- From: Fredrik Thuring [mailto:[EMAIL PROTECTED] Sent: Friday, August 12, 2005 4:52 AM To: r-help@stat.math.ethz.ch Subject: [R] Concerning reading of SAS-files Hi! I'm trying to start a credibility estimation study with a coule of data sets that are created for SAS. The data sets are saved as .sas7bndx and .sas7bdat. I've tried reading them to R with the function 'read.xport' but this returns the error message 'Error in lookup.xport(file) : unable to open file'. Are there any other functions that one could use instead? Thanks a lot to who ever can solve my problem! Fredrik Thuring Codan Insurance, Copenhagen Best regards Fredrik Thuring -- This e-mail and any attachment may be confidential and may also be privileged. If you are not the intended recipient, please notify us immediately and then delete this e-mail and any attachment without retaining copies or disclosing the contents thereof to any other person. Thank you. -- [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] retrieving large columns using RODBC
This appears to be an SQL issue. Look for a way to speed up your queries in Postgresql. I presume you haven't created an index on 'index', which means that every time you run your SELECT, Postgresql is forced to do a full table scan (not good). If the index doesn't solve the problem, look for some SQL help. -Original Message- From: Tamas K Papp [mailto:[EMAIL PROTECTED] Sent: Saturday, August 13, 2005 4:03 AM To: R-help mailing list Subject: [R] retrieving large columns using RODBC Hi, I have a large table in Postgresql (result of an MCMC simulation, with 1 million rows) and I would like to retrive colums (correspond to variables) using RODBC. I have a column called index which is used to order rows. Unfortunately, sqlQuery can't return all the values from a column at once (RODBC complains about lack of memory). So I am using the following code: getcolumns - function(channel, tablename, colnames, totalrows, ordered=TRUE,chunksize=1e5) { r - matrix(double(0),totalrows,length(colnames)) for (i in 1:ceiling(totalrows/chunksize)) { cat(.) r[((i-1)*chunksize+1):(i*chunksize)] - as.matrix( sqlQuery(channel, paste(SELECT, paste(colnames,collapse=, ), FROM, tablename, WHERE index =, i*chunksize, AND index , (i-1)*chunksize, if (ordered) ORDER BY index; else ;))) } cat(\n) drop(r) # convert to vector if needed } to retrieve it in chunks. However, this is very slow -- takes about 15 minutes on my machine. Is there a way to speed it up? I am running Linux on a powerbook, RODBC version 1.1-4, R 2.1.1. The machine has only 512 Mb of RAM. Thanks, Tamas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
One solution is test - c(1.11,10.11,11.11,113.31,114.2,114.3) id - unlist(lapply(strsplit(test,[.]),function(x) {x[2]})) -Original Message- From: Bernd Weiss [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 12:10 PM To: r-help@stat.math.ethz.ch Subject: [R] Regular expressions sub Dear all, I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . TIA, Bernd __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Linux Standalone Server Suggestions for R
Most powerful in what way? Quite a lot depends on the jobs you're going to run. - To run CPU-bound jobs, more CPUs is better. (Even though R doesn't do threading, you can manually split some CPU-bound jobs in several parts and run them simultaneously.) Apart from multiple CPUs and hyperthreading, check the new dual-core CPUs. - To run very large jobs, more memory is better. You can easily spend most of your money on memory. Get the fastest one. - You should get 64-bit CPUs, otherwise you won't be able to run very large jobs (search the list for details). I would suggest that you buy a configuration that can handle more CPUs and memory than you think you need now (say, at least 4 max CPUs and 16 GB max memory), then keep on adding more memory and CPUs as your needs change. hth, b. -Original Message- From: Jia-Shing So [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 31, 2005 10:03 PM To: r-help@stat.math.ethz.ch Cc: Phuoc Hong Subject: [R] Linux Standalone Server Suggestions for R Hi All, My group is looking for any suggestions on what to purchase to achieve the most powerful number crunching system that $50k can buy. The main application that will be used is R so input on what hardware benefits R most will be appreciated. The requirements are that it be a single standalone server (i.e. not a cluster solution), and it that must be able to run unix/linux. If anyone has any experience/ suggestions regarding the following questions that would also be greatly appreciated. AMD vs Intel chips, especially 64-bit versions of the two? Using Itanium/Opterons and if so how much of a performance boost did you achieve vs other 64-bit chip sets? Also, does anyone know if there is an upper thresh hold on much memory R can use? Thanks in advance for any help and suggestions, Jia-Shing So Programmer Analyst Biostatistics and Bioinformatics Lab University of California, San Diego __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RMySQL installation problem on FC4 x86_64
Dear useRs, I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0 and MySQL 4.1.11-2 installed through yum). After an initial configuration error (could not find the MySQL installation include and/or library directories) I managed to install RMySQL with # export PKG_LIBS=-L/usr/lib64/mysql -lmysqlclient # R CMD INSTALL RMySQL_0.5-5.tar.gz However, when I load the package I get this error: require(RMySQL) Loading required package: RMySQL Loading required package: DBI Error in dyn.load(x, as.logical(local), as.logical(now)) : unable to load shared library '/usr/lib64/R/library/RMySQL/libs/RMySQL.so': /usr/lib64/R/library/RMySQL/libs/RMySQL.so: undefined symbol: mysql_field_count [1] FALSE Can anyone offer a suggestion, or perhaps email me a precompiled binary? Thank you, b. platform x86_64-redhat-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major2 minor1.0 year 2005 month04 day 18 language R # yum list installed mysql Installed Packages mysql.i3864.1.11-2 installed mysql.x86_64 4.1.11-2 installed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] boxplot statistics
A related comment - don't rely (too much) on boxplots. They show only a few things, which may be limiting in many cases and completely misleading in others. Here are a couple of suggestions for plots which you may find more useful than the standard box plots: - figure 3.27 from http://www.stat.auckland.ac.nz/~paul/RGraphics/chapter3.html - violin plots (see package vioplot) - density plots - histograms - box-percentile plots (bpplot from Hmisc) - quantile plots - if comparing 2 distributions, qq plots, quantile-difference plots, mean-difference plots etc. -Original Message- From: Karin Lagesen [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 04, 2005 5:24 AM To: [EMAIL PROTECTED] Subject: [R] boxplot statistics I have read and reread the boxplot and the boxplot stats page, and I still cannot understand how and what boxplot shows. I realize that this might be due to me not knowing enough statistics, but anyway... First, how does boxplot determine the size of the box? And is the line inside the box the mean or the median (or something completely different?) And how does it determine how long out the whiskers should go? Also, the boxplot.stats page talks about hinges, what are those? The two hinges are versions of the first and third quartile, i.e., close to 'quantile(x, c(1,3)/4)'. Thankyou very much. Karin -- Karin Lagesen, PhD student [EMAIL PROTECTED] http://www.cmbn.no/rognes/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] add leading 0s to %d from png() {was Automatic creation of file names}
Dear useRs, Is there a way to 'properly' format %d when plotting more than one page on png()? 'Properly' means to me with leading 0s, so that the PNGs become easy to navigate in a file/image browser. Lacking a better solution I ended up using the code below, but would much prefer something like png(test_%d.png,bg=white,width=1000,height=700) where %d could be formatted like formatC(%d,digits=0,wid=3,flag=0,mode=integer) Thank you, b. #---works, but is rather complicated--- pngno - 0 ; i - 1 for (w in 1:53) { if (i %in% c(4*0:100+1)) { pngno - pngno + 1 png(paste(test_,formatC(pngno,digits=0,wid=4,flag=0,mode=integer), .png,sep=),bg=white,width=1000,height=750) par(mfrow=c(2,2),mai=c(4,5,3,2)/10,omi=c(0.2,0,0,0), cex.axis=1,cex.main=1.2) } plot(1:10,main=w) if (i %in% c(4*1:100)) dev.off() i - i+1 } dev.off() From: Mike Prager Mike.Prager at noaa.gov Subject: Re: [R] Automatic creation of file names Newsgroups: gmane.comp.lang.r.general Date: 2005-09-22 14:51:54 GMT (2 weeks, 1 day, 23 hours and 55 minutes ago) Walter -- P.S. The advantage of using formatC over pasting the digits (1:1000) directly is that when one uses leading zeroes, as in the formatC example shown, the resulting filenames will sort into proper order. ...MHP You can use paste() with something like formatC(number,digits=0,wid=3,flag=0) (where number is your loop index) to generate the filenames. on 9/22/2005 10:21 AM Leite,Walter said the following: I have a question about how to save to the hard drive the one thousand datasets I generated in a simulation. ://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] decreasing performance of for() loop
Dear useRs, I'm wondering why the for() loop below runs slower as it progresses. On a Win XP box, the iterations at the beginning run much faster than those at the end: 1%, iteration 2000, 10:10:16 2%, iteration 4000, 10:10:17 3%, iteration 6000, 10:10:17 98%, iteration 196000, 10:24:04 99%, iteration 198000, 10:24:24 100%, iteration 20, 10:24:38 Is there something that can be done about this? Would such a loop run faster in C/C++/Fortran? Thank you, b. #---sample code loop.progress - function(loop,iterations,steps,toprint=NULL) { marks - c(1,floor(iterations/steps)*(1:steps)) if (loop %in% marks) { if (is.null(toprint)) prt - loop else prt - toprint cat(paste(round((which(marks == loop)-1)*(100/steps),0),%, iteration , prt,, ,format(Sys.time(),%H:%M:%S),sep=),\n) } } #---loop that runs slower and slower test - runif(20) out - vector(mode=numeric) lg - 30 for (i in (lg+1):length(test)) { loop.progress(i,length(test),100) out[i] - sum(test[(i-lg):i]) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] decreasing performance of for() loop
Nevermind, I found the fix. Declaring the length for out eliminates the performance decrease, out - vector(mode=numeric,length=length(test)) On 10/10/05, bogdan romocea [EMAIL PROTECTED] wrote: Dear useRs, I'm wondering why the for() loop below runs slower as it progresses. On a Win XP box, the iterations at the beginning run much faster than those at the end: 1%, iteration 2000, 10:10:16 2%, iteration 4000, 10:10:17 3%, iteration 6000, 10:10:17 98%, iteration 196000, 10:24:04 99%, iteration 198000, 10:24:24 100%, iteration 20, 10:24:38 Is there something that can be done about this? Would such a loop run faster in C/C++/Fortran? Thank you, b. #---sample code loop.progress - function(loop,iterations,steps,toprint=NULL) { marks - c(1,floor(iterations/steps)*(1:steps)) if (loop %in% marks) { if (is.null(toprint)) prt - loop else prt - toprint cat(paste(round((which(marks == loop)-1)*(100/steps),0),%, iteration , prt,, ,format(Sys.time(),%H:%M:%S),sep=),\n) } } #---loop that runs slower and slower test - runif(20) out - vector(mode=numeric) lg - 30 for (i in (lg+1):length(test)) { loop.progress(i,length(test),100) out[i] - sum(test[(i-lg):i]) } __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] adding 1 month to a date
Simple addition and subtraction works as well: as.Date(1995/12/01,format=%Y/%m/%d) + 30 If you have datetime values you can use strptime(1995-12-01 08:00:00,format=%Y-%m-%d %H:%M:%S) + 30*24*3600 where 30*24*3600 = 30 days expressed in seconds. -Original Message- From: Marc Schwartz [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 11, 2005 10:16 PM To: t c Cc: r-help@stat.math.ethz.ch Subject: Re: [R] adding 1 month to a date On Tue, 2005-10-11 at 16:26 -0700, t c wrote: Within an R dataset, I have a date field called date_. (The dates are in the format -MM-DD, e.g. 1995-12-01.) How can I add or subtract 1 month from this date, to get 1996-01-01 or 1995-11-01. There might be an easier way to do this, but using seq.Date(), you can increment or decrement from a Time 0 by months: Add 1 month: This takes your Time 0, generates a 2 element sequence (which begins with Time 0) and then takes the second element: seq(as.Date(1995-12-01), by = month, length = 2)[2] [1] 1996-01-01 Subtract 1 month: Same as above, but we use 'by = -1 month' and take the second element: seq(as.Date(1995-12-01), by = -1 month, length = 2)[2] [1] 1995-11-01 See ?as.Date and ?seq.Date for more information. The former function is used to convert from a character vector to a Date class object. Note that in your case, the date format is consistent with the default. Pay attention to the 'format' argument in as.Date() if your dates should be in other formats. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to use large data set ?
By far, the cheapest and easiest solution (and the very first to try) is to add more memory. The cost depends on what kind you need, but here's for example 2 GB you can buy for only $150: http://www.newegg.com/Product/Product.asp?Item=N82E16820144157 Project constraints?! If they don't want to spend a couple hundred USD for memory, you're working on the wrong project (and/or for the wrong organization). Buying more memory (say up to a few GB) is orders of magnitude cheaper than the licenses for some proprietary software that can get around memory constraints, and probably (much) cheaper than the loss of productivity caused by the extra training and setup time needed to try to implement an alternative solution (such as a connection to a DBMS). And even if the extra memory needed for R were as expensive as the license for a proprietary software, which choice would be more reasonable? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of mahesh r Sent: Wednesday, July 19, 2006 4:23 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] how to use large data set ? Hi, I would like to extend to the query posted earlier on using large data bases. I am trying to use Rgdal to mine within the remote sensing imageries. I dont have problems bring the images within the R environment. But when I try to convert the images to a data.frame I receive an warning message from R saying 1: Reached total allocation of 510Mb: see help(memory.size) and the process terminates. Due to project constarints I am given a very old 2.4Ghz computer with only 512 MB RAM. I think what R is currently doing is trying to store the results in the RAM and since the image size is very big (some 9 million pixels), I think it gets out of memory. My question is 1. Is there any possibility to dump the temporary variables in a temp folder within the hard disk (as many softwares do) instead of leting R store them in RAM 2. Could this be possible without creating a connection to a any back hand database like Oracle. Thanks, Mahesh On 7/19/06, Greg Snow [EMAIL PROTECTED] wrote: You did not say what analysis you want to do, but many common analyses can be done as special cases of regression models and you can use the biglm package to do regression models. Here is an example that worked for me to get the mean and standard deviation by day from an oracle database with over 23 million rows (I had previously set up 'edw' as an odbc connection to the database under widows, any of the database connections packages should work for you though): library(RODBC) library(biglm) con - odbcConnect('edw',uid='glsnow',pwd=pass) odbcQuery(con, select ADMSN_WEEKDAY_CD, LOS_DYS from CM.CASEMIX_SMRY) t1 - Sys.time() tmp - sqlGetResults(con, max=10) names(tmp) - c(Day,LoS) tmp$Day - factor(tmp$Day, levels=as.character(1:7)) tmp - na.omit(tmp) tmp - subset(tmp, LoS 0) ff - log(LoS) ~ Day fit - biglm(ff, tmp) i - nrow(tmp) while( !is.null(nrow( tmp - sqlGetResults(con, max=10) ) ) ){ names(tmp) - c(Day,LoS) tmp$Day - factor(tmp$Day, levels=as.character(1:7)) tmp - na.omit(tmp) tmp - subset(tmp, LoS 0) fit - update(fit,tmp) i - i + nrow(tmp) cat(format(i,big.mark=','), rows processed\n) } summary(fit) t2 - Sys.time() t2-t1 Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yohan CHOUKROUN Sent: Wednesday, July 19, 2006 9:42 AM To: 'r-help@stat.math.ethz.ch' Subject: [R] how to use large data set ? Hello R users, Sorry for my English, i'm French. I want to use a large dataset (3 millions of rows and 70 var) but I don't know how to do because my computer crash quickly (P4 2.8Ghz, 1Go ). I have also a bi Xeon with 2Go so I want to do computation on this computer and show the results on mine. Both of them are on Windows XP... To do shortly I have: 1 server with a MySQL database 1computer and I want to use them with a large dataset. I'm trying to use RDCOM to connect the database and installing (but it's hard for me..) Rpad. Is there another solutions ? Thanks in advance Yohan C. -- Ce message est confidentiel. Son contenu ne represente en aucun cas un engagement de la part du Groupe Soft Computing sous reserve de tout accord conclu par ecrit entre vous et le Groupe Soft Computing. Toute publication, utilisation ou diffusion, meme partielle, doit etre autorisee prealablement. Si vous n'etes pas destinataire de ce message, merci d'en avertir immediatement
[R] scatter plot with axes drawn on the same scale
Dear useRs, I'd like to produce some scatter plots where N units on the X axis are equal to N units on the Y axis (as measured with a ruler, on screen or paper). This approach x - sample(10:200,40) ; y - sample(20:100,40) windows(width=max(x),height=max(y)) plot(x,y) is better than plot(x,y) but doesn't solve the problem because of the other parameters (margins etc). Is there an easy, official way of sizing the axes to the same scale, one that would also work with multiple scatter plots being sent to the same pdf() - plus perhaps layout() or par(mfrow())? Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prefixing list names in print
A simple function will do what you want, customize this as needed: lprint - function(lst,prefix) { for (i in 1:length(lst)) { cat(paste(prefix,$,names(lst)[i],sep=),\n) print(lst[[i]]) cat(\n) } } P - list(A=a,B=b) lprint(P,Prefix) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Laurent Deniau Sent: Tuesday, August 08, 2006 12:25 PM To: R-help Subject: [R] prefixing list names in print With print(list(A=a,B=b)) it displays $A [1] a $B [1] b I would like to add a common prefix to all the list tags after the $. Pasting the prefix to the names does not work (appear after the $). For example if the prefix would be P, it should display: P$A [1] a P$B [1] b I tried to add a name attribute to the list or to add a prefix=P to print but nothing works. Any hint? Thanks, Laurent. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] screen resolution effects on graphics
You forgot to mention your OS. This was asked before and if I recall correctly the answer for Windows was no. An acceptable solution (imho) is to edit the Rprofile.site files and add something like pngplotwidth - 990 ; pngplotheight - 700 pdfplotwidth - 14 ; pdfplotheight - 10 Then, use these values in your functions. It's manual, but you only need to do this once for each machine. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charles Annis, P.E. Sent: Monday, August 28, 2006 8:50 AM To: r-help@stat.math.ethz.ch Subject: [R] screen resolution effects on graphics Greetings, R-Citizens: I have the good fortune of working with a 19 1280 X 1024 pixel monitor. My R-code produces nice-looking graphics on this machine but the same code results in crowded plots on an older machine with 800 X 600 resolution. In hindsight this seems obvious, but I didn't anticipate it. My code will be used on machines with varying graphics (and memory) capacity. Is there a way I can check the native resolution of the machine so that I can make adjustments to my code for the possible limitations of the machine running it? Thanks. Charles Annis, P.E. [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Alternatives to merge for large data sets?
One obvious alternative is an SQL join, which you could do directly in a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating indexes on user/userid before the join may save a lot of time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam D. I. Kramer Sent: Thursday, September 07, 2006 2:46 PM To: Prof Brian Ripley Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Alternatives to merge for large data sets? On Thu, 7 Sep 2006, Prof Brian Ripley wrote: Which version of R? Previously, 2.3.1. Please try 2.4.0 alpha, as it has a different and more efficient algorithm for the case of 1-1 matches. I downloaded and installed R-latest, but got the same error message: Error: cannot allocate vector of size 7301 Kb ...though at least the too-big size was larger this time. My data set is not exactly 1-1; every item in prof may have one or more matches in pubbounds, though every item in pubbounds corrosponds only to one prof. --Adam On Wed, 6 Sep 2006, Adam D. I. Kramer wrote: Hello, I am trying to merge two very large data sets, via pubbounds.prof - merge(x=pubbounds,y=prof,by.x=user,by.y=userid,all=TRUE,so rt=FALSE) which gives me an error of Error: cannot allocate vector of size 2962 Kb I am reasonably sure that this is correct syntax. The trouble is that pubbounds and prof are large; they are data frames which take up 70M and 11M respectively when saved as .Rdata files. I understand from various archive searches that merge can't handle that, because merge takes n^2 memory, which I do not have. Not really true (it has been changed since those days). Of course, if you have multiple matches it must do so. My question is whether there is an alternative to merge which would carry out the process in a slower, iterative manner...or if I should just bite the bullet, write.table, and use a perl script to do the job. Thankful as always, Adam D. I. Kramer -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unexpected behavior of boxplot(x, notch=TRUE, log=y)
A function I've been using for a while returned a surprising [to me, given the data] error recently: Error in plot.window(xlim, ylim, log, asp, ...) : Logarithmic axis must have positive limits After some digging I realized what was going on: x - c(10460.97, 10808.67, 29499.98, 1, 35818.62, 48535.59, 1, 1, 42512.1, 1627.39, 1, 7571.06, 21479.69, 25, 1, 16143.85, 12736.96, 1, 7603.63, 1, 33155.24, 1, 1, 50, 3361.78, 1, 37781.84, 1, 1, 1, 46492.05, 22334.88, 1, 1) summary(x) boxplot(x,notch=TRUE,log=y) #unexpected boxplot(x) #ok boxplot(x,log=y) #ok boxplot(x,notch=TRUE) #aha I can get around this, but thought that maybe boxplot() should be adjusted to deal with something like this on its own. Thank you, b. platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 4.0 year 2006 month 10 day03 svn rev39566 language R version.string R version 2.4.0 (2006-10-03) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read 4-jan-02 as date
Dear R users, I have a column with dates (character) in a data frame: 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 and I need to convert them to (Julian) dates so that I can sort the whole data frame by date. I thought it would be very simple, but after checking the documentation and the list I still don't have something that works. 1. as.Date returns the error below. What am I doing wrong? As far as I can see the character strings are in standard format. d$Date - as.Date(d$Date, format=%d-%b-%y) Error in fromchar(x) : character string is not in a standard unambiguous format 2. as.date {Survival} produces this error, d$Date - as.date(d$Date, order = dmy) Error in as.date(d$Date, order = dmy) : Cannot coerce to date format 3. Assuming all else fails, is there a text function similar to SCAN in SAS? Given a string like 9-Jan-01 and - as separator, I'd like a function that can read the first, second and third values (9, Jan, 01), so that I can get Julian dates with mdy.date {survival}. Thanks in advance, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] read 4-jan-02 as date
Thank you everyone. Indeed, I had read the data via read.csv and the date column was a factor. Everything works fine if I convert to character first. Regards, b. --- Sundar Dorai-Raj [EMAIL PROTECTED] wrote: bogdan romocea wrote: Dear R users, I have a column with dates (character) in a data frame: 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 and I need to convert them to (Julian) dates so that I can sort the whole data frame by date. I thought it would be very simple, but after checking the documentation and the list I still don't have something that works. 1. as.Date returns the error below. What am I doing wrong? As far as I can see the character strings are in standard format. d$Date - as.Date(d$Date, format=%d-%b-%y) Error in fromchar(x) : character string is not in a standard unambiguous format 2. as.date {Survival} produces this error, d$Date - as.date(d$Date, order = dmy) Error in as.date(d$Date, order = dmy) : Cannot coerce to date format 3. Assuming all else fails, is there a text function similar to SCAN in SAS? Given a string like 9-Jan-01 and - as separator, I'd like a function that can read the first, second and third values (9, Jan, 01), so that I can get Julian dates with mdy.date {survival}. Thanks in advance, b. If you're reading this from a file (via read.table, for example), then your date column is probably a factor. Convert to character first. x [1] 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 Levels: 10-Jan-01 11-Jan-01 12-Jan-01 5-Jan-01 8-Jan-01 9-Jan-01 Date(x, format=%d-%b-%y) Error in fromchar(x) : character string is not in a standard unambiguous format sort(as.Date(as.character(x), format=%d-%b-%y)) [1] 2001-01-05 2001-01-08 2001-01-09 2001-01-10 2001-01-11 [6] 2001-01-12 --sundar __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] incomplete function output
Dear R users, I have a function (below) which encompasses several tests. However, when I run it, only the output of the last test is displayed. How can I ensure that the function root(var) will run and display the output from all tests, and not just the last one? Thank you, b. root - function(var) { #---Phillips-Perron PP.test(var, lshort = TRUE) PP.test(var, lshort = FALSE) #---Augmented Dickey-Fuller adf.test(var, alternative = stationary, k = trunc((length(var)-1)^(1/3))) #---KPSS kpss.test(var, null = Level, lshort = TRUE) kpss.test(var, null = Trend, lshort = FALSE) } __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] output processing / ARMA order identification
Dear R users, I need to fit an ARMA model. As far as I've seen, EACF (extended ACF) is not available in R. 1. Let's say I fit a series of ARMA models in a loop. Given the code/output included below, how do I pull 'Model' and 'Fit' (AIC) from each summary() so that I can combine them into an array/data frame to be sorted by AIC? 2. Apart from EACF, are you aware perhaps of another function in R that can help solve the issue of ARMA order identification? Thank you, b. arma - arma(var, order=c(1,1), lag=NULL, coef=NULL, + include.intercept = TRUE, series = NULL) summary(arma) Call: arma(x = var, order = c(1, 1), lag = NULL, coef = NULL, include.intercept = TRUE, series = NULL) Model: ARMA(1,1) Residuals: Min 1Q Median 3Q Max -686.092 -68.4994.024 65.531 509.171 Coefficient(s): Estimate Std. Error t value Pr(|t|) ar10.9906530.003724 265.987 2e-16 *** ma1 -0.0195620.030110 -0.650 0.5159 intercept 90.940774 36.9146822.464 0.0138 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Fit: sigma^2 estimated as 14193, Conditional Sum-of-Squares = 17116373, AIC = 14983.22 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] plot time series / dates (basic)
Dear R users, I'm having a hard time with some very simple things. I have a time series where the dates are in the format 7-Oct-04. I imported the file with read.csv so the date column is a factor. The series is rather long and I want to plot it piece by piece. The function below works fine, except that the labels for date are meaningless (ie 9.47e+08 or 109800 - apparently the number of seconds since whatever). I don't want to convert the data frame to a ts object because there are missing days and I don't want any interpolation. 1. How do I replace the date labels with something like 'Mar04', instead of 9.47e+08 / 109800? 2. In the PDF file, the space between the two graphs printed pair by pair is fairly large. Can I remove/reduce the area that seems reserved for Title and X label so that, on a page, the space between the graph at the top and the one at the bottom is minimized? 3. Given the function below, I haven't discovered a way to have vara appear as the Title or Y label in graphs. main=as.character(vara) lists all the values of vara (which is a column from the data frame d). So, how can I use the name of a vector as title or label in a plot? Thank you, b. d - ('data.csv', header = T, sep = ,, quote=, dec=., fill = T, skip=0) attach(d) #function to plot a long time series piece by piece pl - function(vara, varb, points) { date - as.POSIXct(strptime(as.character(Date), %d-%b-%y), tz = GMT) pr1 - vector(mode=numeric) pr2 - vector(mode=numeric) dat - vector() for (j in 1:(round(length(Vol)/points)+1)) #number of plots { for (i in ((j-1)*points+1):(j*points)) { pr1[i-points*(j-1)] - vara[i] pr2[i-points*(j-1)] - varb[i] dat[i-points*(j-1)] - date[i] } par(mfrow=c(2,1)) plot(dat, pr1, type=b) plot(dat, pr2, type=b) } } pdf(Rplots.pdf) pl(Vol, atr, 50) dev.off() __ Check out the new Yahoo! Front Page. www.yahoo.com __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] plot time series / dates (basic)
Thank you for the suggestions. I managed to fix everything except the first part. dat - date[(j-1)*points+1):(j*points)] causes a syntax error. If I do dat - vector() I end up with numbers (which is fine by me - just like SAS dates). However, after checking a couple of sources I still have no idea how to format numbers as dates (for plotting/printing). Does anyone have an example for formatting 12710 (# of days since 1 Jan 1970) as 19-Oct-04 (in the x axis of a plot)? Regards, b. #function to plot a long time series piece by piece pl - function(vara, varb, points) { date - as.Date(as.character(Date), %d-%b-%y) pr1 - vector(mode=numeric) pr2 - vector(mode=numeric) #dat - vector() dat - date[(j-1)*points+1):(j*points)] for (j in 1:(round(length(Vol)/points)+1)) #number of plots { for (i in ((j-1)*points+1):(j*points)) { pr1[i-points*(j-1)] - vara[i] pr2[i-points*(j-1)] - varb[i] #dat[i-points*(j-1)] - date[i] #dat - date[i] } par(mfrow=c(2,1), mai=c(0.4, 0.5, 0.3, 0.1), omi=c(0.2, 0, 0, 0), cex.axis=0.7, cex=1.2, cex.main=0.7, pch=*) plot(dat, pr1, main=deparse(substitute(vara)), type=o) #axis.Date(1,dat,format=%b%y) plot(dat, pr2, main=deparse(substitute(varb)), type=o) } } --- Prof Brian Ripley [EMAIL PROTECTED] wrote: On Mon, 1 Nov 2004, bogdan romocea wrote: Dear R users, I'm having a hard time with some very simple things. I have a time series where the dates are in the format 7-Oct-04. So why use as.POSIXct for a date, rather than as.Date? I imported the file with read.csv so the date column is a factor. The series is rather long and I want to plot it piece by piece. The function below works fine, except that the labels for date are meaningless (ie 9.47e+08 or 109800 - apparently the number of seconds since whatever). I don't want to convert the data frame to a ts object because there are missing days and I don't want any interpolation. 1. How do I replace the date labels with something like 'Mar04', instead of 9.47e+08 / 109800? Just don't convert them to that format. You set up dat - vector() which is not a dates object. If you use standard R indexing, it will work. If you throw the class away, it will not. Try dat - date[(j-1)*points+1):(j*points)] etc (no for loop required). If you want a different format, see ?axis.Date 2. In the PDF file, the space between the two graphs printed pair by pair is fairly large. Can I remove/reduce the area that seems reserved for Title and X label so that, on a page, the space between the graph at the top and the one at the bottom is minimized? There's a whole chapter on this in `An Introduction to R': have you read it? 3. Given the function below, I haven't discovered a way to have vara appear as the Title or Y label in graphs. main=as.character(vara) lists all the values of vara (which is a column from the data frame d). So, how can I use the name of a vector as title or label in a plot? That's almost an FAQ. Use deparse(substitute(vara)) d - ('data.csv', header = T, sep = ,, quote=, dec=., fill = T, skip=0) attach(d) #function to plot a long time series piece by piece pl - function(vara, varb, points) { date - as.POSIXct(strptime(as.character(Date), %d-%b-%y), tz = GMT) pr1 - vector(mode=numeric) pr2 - vector(mode=numeric) dat - vector() for (j in 1:(round(length(Vol)/points)+1)) #number of plots { for (i in ((j-1)*points+1):(j*points)) { pr1[i-points*(j-1)] - vara[i] pr2[i-points*(j-1)] - varb[i] dat[i-points*(j-1)] - date[i] } par(mfrow=c(2,1)) plot(dat, pr1, type=b) plot(dat, pr2, type=b) } } pdf(Rplots.pdf) pl(Vol, atr, 50) dev.off() -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ Check out the new Yahoo! Front Page. www.yahoo.com __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] misleading output after ordering data frame
Dear R users, I have a data frame which I create with read.csv and then order by date: d - na.omit(read.csv(...)) d - d[order(as.Date(as.character(d$Date), format=%d-%b-%y), decreasing=F, na.last=F),] My problem is that even though the data frame is ordered as requested, the old row numbers are preserved. For example: * Before sorting: d[1:3,] Date Amt 1 5-Nov-04 87.07 2 4-Nov-04 85.80 3 3-Nov-04 82.90 * After sorting: d[1:3,] Date Amt 500 12-Nov-02 84.23 499 13-Nov-02 85.05 498 14-Nov-02 84.95 Is there a way to update the row numbers as well? It's not that important, but I find it a bit confusing. Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] an off-topic question - model validation
Assuming you have enough data, usually 1/4 to 1/2 is used for validation. One reference would be Picard, R.R. and Berk, K.N. (1990) Data Splitting, The American Statistician, 44;140-147. hth, b. -Original Message- From: Wensui Liu [mailto:[EMAIL PROTECTED] Sent: Thursday, November 11, 2004 10:20 PM To: [EMAIL PROTECTED] Subject: [R] an off-topic question - model validation Currently, I am working on a data mining project and plan to divide the data table into 2 parts, one for modeling and the other for validation to compare several models. But I am not sure about the percentage of data I should use to build the model and the one I should keep to validate the model. Is there any literature reference about this topic? Thank you so much! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] density estimation: compute sum(value * probability) for given distribution
Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] density estimation: compute sum(value * probability) for given distribution
Andy, Thanks a lot for the clarifications. I was running a simulation a number of times and trying to come up with a number to summarize the results. And, I failed to realize from the beginning that what I was trying to compute was just the mean. Regards, b. --- Liaw, Andy [EMAIL PROTECTED] wrote: First thing you probably should realize is that density is _not_ probability. A probability density function _integrates_ to one, not _sum_ to one. If X is an absolutely continuous RV with density f, then Pr(X=x)=0 for all x, and Pr(a X b) = \int_a^b f(x) dx. sum x*Pr(X=x) (over all possible values of x) for a discrete distribution is just the expectation, or mean, of the distribution. The expectation for a continuous distribution is \int x f(x) dx, where the integral is over the support of f. This is all elementary math stat that you can find in any textbook. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy From: bogdan romocea Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph is fine, den - density(cap, from=min(cap), to=max(cap), give.Rkern=F) plot(den) However, how do I compute sum(values*probabilities)? The probabilities produced by the density function sum to only 26%: sum(den$y) [1] 0.2611142 Would it perhaps be ok to simply do sum(den$x*den$y) * (1/sum(den$y)) [1] 1073.22 ? Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Running R from CD?
Better install and run R from a USB flash drive. This will save you the trouble of re-writing the CD as you upgrade and install new packages. Also, you can simply copy the R installation on your work computer (no install rights needed); R will run. HTH, b. From: Hans van Walen hans_at_vanwalen.com Date: Fri 27 Aug 2004 - 23:54:53 EST At work I have no permission to install R. So, would anyone know whether it is possible to create a CD with a running R-installation for a windows(XP) pc? And of course, how to? Thank you for your help, Hans van Walen __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] SAS or R software
neela v writes: Hi all there Can some one clarify me on this issue, features wise which is better R or SAS, leaving the commerical aspect associated with it. I suppose there are few people who have worked on both R and SAS and wish they would be able to help me in deciding on this. THank you for the help I very much doubt you can make an informed decision if you leave the commercial aspect (license) aside. A single Base SAS installation (server) can cost tens of thousands of [[your currency here; may need to multiply by 10 or 100 or more]] in the first year, then a percentage of that in the following years. (SAS software is not purchased, but licensed on a yearly basis.) Want more than Base SAS? Prepare your wallet: thousands upon thousands (per year) for regression, anova, clustering (SAS/Stat), graphics (SAS/Graph), time series (SAS/ETS), optimizations (SAS/OR) etc. Then, if you want decision trees and neural networks (Enterprise Miner), I warmly recommend you to quickly find a chair and sit down before you hear the price tag. Will you always work for an organization that licenses SAS software? Will the organization license all the modules you'll need? Will those modules do everything you want? As others have said, R is a lot more flexible, and the GPL ensures that whatever you can do today will continue to be expanded and improved (much faster than SAS Institute would want or be able to expand/improve SAS). All in all, if you're primarily interested in data analysis (and don't want, for example, to get a job as a SAS programmer) and still choose SAS, you will regret it one day. The benefits are few (such as robust manipulation of massive data sets - I mean in excess of hundreds of millions of rows) and the risks are high (whatever you do is dependent on proprietary, very expensive software). With R, almost the opposite is true: lots of benefits and no risks (nothing can take R away from you). HTH, b. __ All your favorites on one personal page Try My Yahoo! __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] [BASIC] Solution of creating a sequence of object names
You may be missing something. After you create all those objects, you'll want to use them. Use get(): for (i in 1:10) ... get(paste(object,i,sep=)) ... It took me about a week to find out how to do this. I waited for a few days, but before I got to ask this basic/rtfm question, someone else - fortunately :-) - did. HTH, b. -Original Message- From: John [mailto:[EMAIL PROTECTED] Sent: Monday, November 29, 2004 4:03 PM To: [EMAIL PROTECTED] Subject: [R] [BASIC] Solution of creating a sequence of object names Dear R-users, I state that this is for beginners, so you may ignore this in order not to be irritated. By the way, patience is another important thing, together with kindness, we should keep in mind when we teach students and our own children as Jim Lemon pointed out well in the context of the Socratic method. You may know that being kind does not mean giving spoonfed answers to questioners. - I was asked for the solution of my problem, and a couple of answers were given to me in private emails. I am not sure if it was a mere accident. I post them now, without their permission, for those who are interested in learning them. So if you're happy to know the solution, thanks should go to the person concerned. I thank all the three people named below. (1) my solution after reading the R-FAQ 7.21 as Uwe Ligges pointed out for ( i in 1:10 ) { + assign(paste(my.file., i, sep=), NULL) + } (2) Adai Ramasamy's solution for(obj in paste(my.ftn, 1:10, sep=)) assign(obj, NULL) ### or for(i in 1:10) assign(paste(my.ftn, i, sep=), NULL) (3) James Holtman's solution # For example, if you want to generate 10 groups # of 5 random numbers and store them # under then names GRPn where n is 1 - 10, # the following can be used: # Result - list() # create the list for (i in 1:10) Result[[paste(GRP, i, sep='')]] - runif(5) # store each result Result# print out the data $GRP1 [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 $GRP2 [1] 0.89838968 0.94467527 0.66079779 0.62911404 0.06178627 $GRP3 [1] 0.2059746 0.1765568 0.6870228 0.3841037 0.7698414 $GRP4 [1] 0.4976992 0.7176185 0.9919061 0.3800352 0.7774452 $GRP5 [1] 0.9347052 0.2121425 0.6516738 0.121 0.2672207 $GRP6 [1] 0.38611409 0.01339033 0.38238796 0.86969085 0.34034900 $GRP7 [1] 0.4820801 0.5995658 0.4935413 0.1862176 0.8273733 $GRP8 [1] 0.6684667 0.7942399 0.1079436 0.7237109 0.4112744 $GRP9 [1] 0.8209463 0.6470602 0.7829328 0.5530363 0.5297196 $GRP10 [1] 0.78935623 0.02333120 0.47723007 0.73231374 0.69273156 Regards, John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Protocol for answering basic questions
I'm also an R beginner. I have asked stupid questions, and received RTFM replies. I believe such replies are _GREAT_, as long as they include a brief reference to what to read, and where. (In some cases searches don't work unless you happen to use the 'right' keywords, and in other cases it may be relatively easy to miss a paragraph in a manual - or even FAQ.) I believe that rudeness (perceived or real) doesn't matter. It is only solving the problem that matters. In this respect, it seems to me that most (if not all) users who ask a question on R-help figure out what to do. In regards to politeness, I think that the solution - and the problem - lies almost completely in the other camp: those who ask (and not those who reply). I would recommend all R beginners to not feel easily offended, and to not be afraid to ask stupid questions. So what if you risk being perceived a lazy idiot? (As I occasionally am, and certainly will be again.) Do go ahead and ask, if you must. Do you need to solve your problem or not? Many many many thanks to all those who bother to answer questions on R-help. (I still find it hard to believe that experts such as Brian Ripley and Peter Dalgaard, to quote just two names, take the trouble to answer so many questions, including basic ones.) And, of course, thank heavens and the R Core Team that R exists. b. -Original Message- From: Robert Brown FM CEFAS [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 11:46 AM To: [EMAIL PROTECTED] Subject: [R] Protocol for answering basic questions I have been following the discussions on 'Reasons not to answer very basic questions in a straightforward way' with interest as someone who is also new to R and has had similar experiences. As such it with sadness that I note that most seem to agree with the present approach to the responses to basic questions. I must thank those respondants to my own questions who have been helpful, but there are some whose replies are in my opinion not only unhelpful but actually rude. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] finding the most frequent row
Here's something that works. I'm sure there are better solutions (in particular the paste part - I couldn't figure out how to avoid typing a[i,1], ..., a[i,10]). a - matrix(nrow=1000,ncol=10) for (i in 1:1000) for (j in 1:10) a[i,j] - sample(1:0,1) b - vector(mode=character) for (i in 1:1000) b[i] - paste(a[i,1],a[i,2],a[i,3],a[i,4],a[i,5], a[i,6],a[i,7],a[i,8],a[i,9],a[i,10],sep=) #the most frequent row table(b)[table(b) == max(table(b))] HTH, b. -Original Message- From: Lisa Pappas [mailto:[EMAIL PROTECTED] Sent: Thursday, December 09, 2004 5:15 PM To: [EMAIL PROTECTED] Subject: [R] finding the most frequent row I am bootstrapping using a function that I have defined. The Statistic of the function is an array of 10 numbers. Therefore if I use 1000 replications, the t matrix will have 1000 rows each of which is a bootstrap replicate of this 10 number array (10 columns). Is there any easy way in R to determine which row appears the most frequently? Thanks, Lisa Pappas Huntsman Cancer Institute wishes to promote open communication while protecting confidential and/or privileged information. If you have received this message in error, please inform the sender and delete all copies. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] errors when trying to rename data frame columns
Dear R users, I need to rename the columns of a series of data frames. The names of the data frames and those of the columns need to be pulled from some vectors. I tried a couple of things but only got errors. What am I missing? #---create data frame dframes - c(a,b,c) assign(dframes[2],data.frame(11:20,21:30)) #---rename the columns cols - c(one,two) names(get(dframes[2])) - cols Error: couldn't find function get- assign(dframes[2],data.frame(cols[1]=11:20,cols[2]=21:30)) Error: syntax error labels(get(dframes[2]))[[2]] - cols Error: couldn't find function labels- I'm using R 2.0.0 on Windows XP. Thank you, b. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] switching to Linux, suggestions?
Before choosing a GNU/Linux distribution look into the package management issue. http://distrowatch.com/ I would suggest that you avoid all RPM-based distributions (Mandrake, Fedora, SuSE), and consider Debian (+ those based on it) the source-based distributions (such as Gentoo). I've been using Mandrake for a couple of years but got tired of RPM. HTH, b. -Original Message- From: Thomas W Volscho [mailto:[EMAIL PROTECTED] Sent: Sunday, December 12, 2004 3:24 PM To: [EMAIL PROTECTED] Subject: [R] switching to Linux, suggestions? Dear List, I have acquired a new desktop and wanted to put a free OS on it. I am trying Fedora Core 1, but not sure what the best Linux OS is for using R 2.0.1? Thank you in advance for your input, Tom Volscho Thomas W. Volscho Graduate Student Dept. of Sociology U-2068 University of Connecticut Storrs, CT 06269 Phone: (860) 486-3882 http://vm.uconn.edu/~twv1 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ Dress up your holiday email, Hollywood style. Learn more. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Moving standard deviation?
A simple for loop does the job. Why not write your own function? movsd - function(series,lag) { movingsd - vector(mode=numeric) for (i in lag:length(series)) { movingsd[i] - sd(series[(i-lag+1):i]) } assign(movingsd,movingsd,.GlobalEnv) } This is very efficient: it takes (much) less time to write from scratch than to look for an existing function. HTH, b. -Original Message- From: doktora v Sent: Monday, December 13, 2004 1:46 PM Cc: [EMAIL PROTECTED] Subject: Re: [R] Moving standard deviation? I have tried there but didn't find anything useful. Most of the matches are for functions which take a std dev input, and the moving part of the query relates to something else (like moving average in the qcc package). Anyway, it's not too difficult to create the function, but I was wondering if anyone had done it before. Efficiency is a concideration, naturally. I'll post what i come up with... cheers dok On Mon, 13 Dec 2004 10:04:59 -0800, Spencer Graves [EMAIL PROTECTED] wrote: A search for moving standard deviation at www.r-project.org - search - R site search just produced 7 matches. Please look at those and let us know if none of those help you (and what you tried that didn't work). spencer graves doktora v wrote: Is there a simple function in R to get a moving standard deviation (i.e. for the last x samples)? My goal is to plot bollinger bands around a moving average for price data. I use kernel smoothing for the moving average. cheers and thanks! over and out -- doktora __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ Jazz up your holiday email with celebrity designs. Learn more. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] sort() leaves row names unaffected
I asked the same question a few weeks ago. See http://tolstoy.newcastle.edu.au/R/help/04/11/6775.html -Original Message- From: Martin Wegmann Sent: Tuesday, December 14, 2004 6:23 AM To: [EMAIL PROTECTED] Subject: [R] sort() leaves row names unaffected Hello, I wonder if I ran into a bug. If I do summary(df1$X1) - df1.y df1.y a b c d e [1,] 50.74627 8.955224 17.91045 19.40299 2.985075 sort(df1.y) a b c d e [1,] 2.985075 8.955224 17.91045 19.40299 50.74627 my numbers are sorted but do not anymore correspond to the rownames. For me it is counterintuitive that solely the numbers are sorted and not the names. Is there a way to sort names + numbers or is this behaviour of sort() unintended? Martin R 2.0.1-1 debian reposit. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ Dress up your holiday email, Hollywood style. Learn more. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Re : Save result in a For Loop
Not sure if it's the best way, but you could do it this way: all.results - vector(mode=numeric) for (i in 1:100) { ... this.run - ... all.results - c(all.results,this.run) } At this point all.results contains the values of this.run from the whole loop. If this.run is not a vector/number but a data frame look at rbind/cbind. Or, create a vector/matrix first and then populate it from the for loop: all.results - vector()/matrix()/data.frame() for (i in 1:100) for(j ...) { ... all.results[i] - this.run ,OR all.results[i,] - this.run , OR all.results[i,j] - this.run } HTH, b. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 2:44 PM To: [EMAIL PROTECTED] Subject: [R] Re : Save result in a For Loop Hiya, I have been struggling to save the result from the FOR loop. What is the best way to do it, as I need the result to merge with another dataset for further analysis ? for (dd in ((M-10):M)){ + dist-(32-dd) + r-1/2*(1-exp(-2*dist/100)) + map-c(dd,round(r,4)) + print(map) + next + } Thanks. Stella ___ This message, including attachments, is confidential. If\ yo...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] faster row by row data frame processing
Dear R users, I have a data frame with a few thousand rows and several hundred numeric columns (plus a date column). For each row (day), I want to assign +/- 1 to the highest X absolute values, 0 to the other values, and save all that in a separate data frame. I have a working solution (below), however I find it rather slow. Is there something I could do to increase the speed? (The code is CPU-bound; Pentium 4 @ 2.4 GHz, 512 MB RAM, Win XP, R 2.0.0.) Thank you, b. #all is the original data frame (date + a number of columns) #set up the output data frame DailyTopN - data.frame(all[1,1],matrix(ncol=ncol(all)-1)) names(DailyTopN) - names(all) top - 20 for (i in 1:1000) #the rows to be processed { #data frame row as vector onerow - na.omit(as.matrix(all[i,][2:ncol(all)])[1, ]) #select the 'top' highest absolute values r - rank(abs(onerow),ties.method=random) selected - names(r[which(r = top)]) #set +/-1 for the highest absolute values, 0 for the others DailyTopN[i,selected] - 1 * sign(all[i,selected]) DailyTopN[i,1] - all[i,1] #add the date } DailyTopN[is.na(DailyTopN)] - 0 rownames(DailyTopN) - 1:nrow(DailyTopN) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] scheduling R tasks under windows
Save the command(s) in a batch (.bat) file, and then run the .bat file from the task scheduler. -Original Message- From: Mikkel Grum [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 21, 2004 3:18 PM To: RHelp Subject: [R] scheduling R tasks under windows I'm trying to schedule R tasks in Windows Server 2003. I can run the following from the DOS prompt without any difficulty: c:\Reportsc:\r\rw2001\bin\rterm.exe --no-restore --no-save test.R test.out where test.r has two lines: library(tools); Sweave(rlr.Rnw). When I try to run the same from the task scheduler, I fill in the dialogue box as follows: Run:c:\r\rw2001\bin\rterm.exe --no-restore --no-save test.R test.out Start in: c:\Reports Which opens Rterm, but is preceded by ARGUMENT 'test.R' __ignored__ and ARGUMENT 'test.out' __ignored__ Anyone know what I'm doing wrong? Mikkel __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] how to fit in R
See http://www.statsoft.com/textbook/stdisfit.html There are several approaches you can use - Chi-square, Q-Q plots, P-P plots, various tests (Kolmogorov-Smirnov, Shapiro-Wilks' W) etc. HTH, b. -Original Message- From: Angela Re Sent: Wednesday, December 22, 2004 9:13 AM To: [EMAIL PROTECTED] Subject: [R] how to fit in R Good morning, in my work I need to study data distributions and so I need to fit the experimental distribution by theoretical curves such as normal, Poison, binomial and so on. I'd like to know, given a vector of data, for example x-rnorm (1000, 10) if they follow a normal distribution. I'd like to do a fit (to estimate the parameters of the theoretical distribution) and then a goodness test. Can you suggest me any R package or manuals about this issue? The documentation on the R-guide isn't sufficient to me. Thank you of your help, Angela __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] combination of scatterplot and image graph
Dear R users, I'm interested in a combination of a scatterplot and an image graph. I have two large vectors. Because in the scatterplot some areas are sparsely and others densely populated, I want to see the points, and I also want their color to be changed based on their density (similar to a heat map). Is there a function that can do that? Thank you, b. __ Send a seasonal email greeting and help others. Do good. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] coplot with png: disappearing grid lines
Dear useRs, When I use coplot() and output to png/jpeg/bmp, the grid lines from the scatter plots disappear. If I output to pdf() the grid lines are there, however I can't use it - I have many points, and the resulting PDF file is large and very slow to open and scroll through. (By the way, if I click File-Save As-png/jpeg/bmp from Rgui.exe, the grid lines are preserved - but I need to use code.) With coplot(), is there a way to: 1. Keep the grid lines in the scatter plots when exporting output to png(), and perhaps change their color? 2. Specify the number of grid lines to be drawn on the x and y axes? I'm running R 2.0.0 on Win XP. Thank you, b. a - rnorm(5) b - rnorm(5) c - rnorm(5) #pdf(test.pdf,height=9,width=12) png(test.png,height=900,width=1200) coplot(a ~ b | c,pch=20,col=navy, bar.bg=c(num=gray(0.8),fac=grey(0.95))) dev.off() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Tuning string matching
This is a rather complex problem. I'm not aware of an R function / package that can do something like this, but in case you need to build it from scratch read http://support.sas.com/documentation/periodicals/obs/obswww15/index.html If you're familiar with SAS you could translate the code to R. HTH, b. -Original Message- From: [EMAIL PROTECTED] Sent: Wednesday, January 05, 2005 12:36 PM To: r-help@stat.math.ethz.ch Subject: [R] Tuning string matching Dear list, I spent about two hours searching on the message archive, with no avail. I have a list of people that have to pass an on-line test, but only a fraction of them do it. Moreover, as they input their names, the resulting string do not always match the names I have in my database. I would like to do two things: 1. Match any strings that are 90% the same Example: name1 - Harry Harrington name2 - Harry Harington I need a function that would declare those strings as a match (ideally having an argument that would allow introducing 80% instead of 90%) 2. Arrange a final table that would take me from: Table1 (the complete list of people from my database) No Name 1 Byron C. Andrew 2 Friedman Bob 3 Harrington Harry Table2 (the people having been tested) No Name Score 1 Harry Harington13 2 Byron Andrew 28 to: No Name1 Name2 Score 1 Byron C. AndrewByron Andrew 28 2 Friedman Bob 3 Harrington Harry Harry Harington13 Thank you in advance, any help is highly appreciated. Adrian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] global objects not overwritten within function
Dear useRs, I have a function that creates several global objects with assign(obj,obj,.GlobalEnv), and which I need to run iteratively in another function. The code is similar to f - function(...) { assign(obj,obj,.GlobalEnv) } fct - function(...) { for (i in 1:1000) { ... f(...) ...obj... rm(obj) #code fails without this line } } I don't understand why f(), when run in a for() loop inside fct(), does not overwrite the global object 'obj'. If I don't delete 'obj' after I use it, the code fails - the same objects created by the first iteration are used by subsequent iterations. I checked ?assign and the Evaluation chapter in 'R Language Definition' but still don't understand why the above happens. Can someone briefly explain or suggest something I should read? By the way, I don't want to use 'better' techniques (lists, functions that return values instead of creating global objects etc) - I want to create global objects with f() and overwrite them again and again within fct(). Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] global objects not overwritten within function
Apparently the message below wasn't posted on R-help, so I'm sending it again. Sorry if you received it twice. --- bogdan romocea [EMAIL PROTECTED] wrote: Date: Tue, 11 Jan 2005 17:31:42 -0800 (PST) From: bogdan romocea [EMAIL PROTECTED] Subject: Re: [R] global objects not overwritten within function Thank you to everyone who replied. I had no idea that ... means something in R, I only wanted to make the code look simpler. I'm pasting below the functional equivalent of what took me yesterday a couple of hours to debug. Function f() takes several arguments (that's why I want to have the code as a function) and creates several objects. I then need to use those objects in another function fct(), and I want to overwrite them to save memory (they're pretty large). It appears that Robert's guess (dynamic/lexical scoping) explains what's going on. I've noticed though another strange (to me) issue: without indexing (such as obj1 - obj1[obj1 0] - which I need to use though), fct() prints the expected values even without removing the objects after each iteration. However, after indexing is introduced, rm() must be used to make fct() return the intended output. How would that be explained? Kind regards, b. f - function(read,position){ obj1 - 5 * read[position]:(read[position]+5) obj2 - 7 * read[position]:(read[position]+5) assign(obj1,obj1,.GlobalEnv) assign(obj2,obj2,.GlobalEnv) } fct - function(input){ for (i in 1:5) { f(input,i) obj1 - obj1[obj1 0] obj2 - obj2[obj2 0] print(obj1) print(obj2) # rm(obj1,obj2) #get intended results with this line } } a - 1:10 fct(a) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] help wanted using R in a classroom
It appears you wouldn't get much improvement at all even if the 2nd CPU were used at 100%. Five R sessions can easily overwhelm one CPU. I think you need (a lot) more CPUs than 2 to solve your problem. Possible solutions: 1. Install R on each eMac. Since you have 40 of them, you might want to put together a script to do this. 2. Get some boxes that can run Windows. On Windows, you can run R from a CD/zip drive/USB drive. (So you could burn 40 CDs and have everyone run their R session on their box.) As far as I know the same is not true for GNU/Linux and Mac OS. HTH, b. -Original Message- From: Sam Parvaneh Sent: Monday, January 17, 2005 6:11 AM To: r-help@stat.math.ethz.ch Subject: [R] help wanted using R in a classroom Hi everyone! I'm using R 2.0.1 for Mac OS X in a classroom with 40 eMacs running Mac OS X version 10.3.6. These Macs are network based, meaning that the students log in to an XServe G4 where their user accounts and home directories are stored. The problem that I'm having each time a group of students (usually 7 to 10) use R is that the whole system get incredibly slow. The response time for opening an application while the students are running R is around 5 minutes. If a student wants to log into the system while others are running R, it can take up to 10 minutes for the student to get logged in. Everything gets very slow that it's almost impossible to work. When I look at the server Graphs, the CPU usage of the first CPU is always 100% when these students are using R. The second CPU is left at 15%. When these students quit R, then everything's is back to normal again. The usage of both CPUs go back down to between 5-10%. Is there anyone out there using R in a university like this? Does anyone have an idea what this might depend one or maybe a solution? I can provide some more information if anyone wants, if you think you can help me. Thanks in advance /Sam [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] animation without intermediate files?
Here's a different suggestion. Create a bunch of image files, and then use an image browser (GQview is one of the best; if you're on Win look at ACDSee) to view them as a slide show. Good image browsers read images in advance and should not produce flickering. I haven't experimented though with delays under 5 seconds. HTH, b. -Original Message- From: Paul Murrell Sent: Wednesday, January 26, 2005 2:46 PM To: Martin Maechler Cc: Cari G Kaufman; r-help@stat.math.ethz.ch Subject: Re: [R] animation without intermediate files? Hi Martin Maechler wrote: MM == Martin Maechler [EMAIL PROTECTED] on Tue, 25 Jan 2005 09:59:03 +0100 writes: Paul == Paul Murrell [EMAIL PROTECTED] on Tue, 25 Jan 2005 13:40:15 +1300 writes: Paul Hi Paul Cari G Kaufman wrote: Hello, Does anyone know how to make movies in R by making a sequence of plots? I'd like to animate a long trajectory for exploratory purposes only, without creating a bunch of image files and then using another program to string them together. In Splus I would do this using double.buffer() to eliminate the flickering caused by replotting. For instance, with a 2-D trajectory in vectors x and y I would use the following: motif() double.buffer(back) for (i in 1:length(x)) { plot(x[i], y[i], xlim=range(x), ylim=range(y)) double.buffer(copy) } double.buffer(front) I haven't found an equivalent function to double.buffer in R. I tried playing around with dev.set() and dev.copy() but so far with no success (still flickers). Paul Double buffering is only currently an option on the Windows graphics Paul device (and there it is on by default). So something like ... Paul x - rnorm(100) Paul for (i in 1:100) Paul plot(1:i, x[1:i], xlim=c(0, 100), ylim=c(-4, 4), pch=16, cex=2) Paul is already smooth MM well, sorry Paul, but not for my definition of smooth! MM Instead, MM n - 100 MM plot(1,1, xlim=c(0,n), ylim=c(-4,4), type=n) MM x - rnorm(n) MM for (i in 1:n) { points(i, x[i], pch=16, cex=2); Sys.sleep(0.02) } MM comes much closer to my version of smooth ;-) I apologize to Paul, since what I said seems to be quite platform dependent. Here's my current knowledge on the matter: o Paul's for(..) plot(..) - flickers quite a bit for me {on Linux X11 with no particularly fast graphics card}. - seems quite smooth for at least two Windows users who have relatively fast graphics cards. o My solution of for(..) { points(..) ; Sys.sleep(..) } doesn't redraw the coordinate system and so doesn't flicker (afaik, independently of platform) HOWEVER on windows; the graphics are somehow buffered and points are not drawn one by one, but rather in batches -- not smooth Thanks Martin; I wasn't very clear on my original message. Double buffering has only been implemented on the Windows graphics device at this stage (thanks to Brian) and this implementation basically always writes to a buffer and updates the screen at fixed time intervals (quoting the source: 100ms after last plotting call or 500ms after last update) so there is no user control of when the off-screen buffer is swapped to the screen. For animating a plot where only new output is added (i.e., no existing output is modified or removed), your suggestion should produce the smoothest result. Paul -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 [EMAIL PROTECTED] http://www.stat.auckland.ac.nz/~paul/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] have R informed of MySQL table updates
Dear useRs, I have a script (Python) that every once in a while appends data to a MySQL table. Meanwhile, I have a running R session, and I want it to be aware of such table updates. I could write a loop in R to periodically check whether new data has become available; however, are you aware of a way to make MySQL/Python talk directly to R? I'm interested in both GNU/Linux and Windows approaches (if any). Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] question about sorting POSIXt vector
Dear useRs, How come the first attempt to sort a POSIXt vector fails (Error: non-atomic type in greater), while the second succeeds? (Code inserted below.) The documentation says that POSIXt is used to allow operations such as subtraction, so I'd expect sorting to work. Is this perhaps an OS issue? (I run R 2.0.1 on Win xp.) Thank you, b. #code test - c(2005-02-08 18:49:15,2005-02-07 18:36:54, 2005-02-04 18:37:03,2005-02-06 18:29:04) test - strptime(test,format=%Y-%m-%d %H:%M:%S) order(test,decreasing=F)#doesn't work - why? tst - test + 0 order(tst,decreasing=F) #works - how come? print(tst) #run test - c(2005-02-08 18:49:15,2005-02-07 18:36:54, + 2005-02-04 18:37:03,2005-02-06 18:29:04) test - strptime(test,format=%Y-%m-%d %H:%M:%S) order(test,decreasing=F)#doesn't work - why? Error in order(test, decreasing = F) : non-atomic type in greater tst - test + 0 order(tst,decreasing=F)#works - how come? [1] 3 4 2 1 print(tst) [1] 2005-02-08 18:49:15 Eastern Standard Time 2005-02-07 18:36:54 Eastern Standard Time [3] 2005-02-04 18:37:03 Eastern Standard Time 2005-02-06 18:29:04 Eastern Standard Time __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] download files through secure http (HTTPS)
Dear useRs, I'm trying to download some data through the HTTPS protocol. However, download.file() does not support HTTPS (R 2.0.1 on WinXP): Error in download.file(https.url, destfile = test.txt) : unsupported URL scheme 1. Is there any other function/package in R that can work with HTTPS? 2. If not, what would need to happen to make download.file() support HTTPS? Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] draw random samples from empirical distribution
Dear useRs, I have an empirical distribution (not normal etc) and I want to draw random samples from it. One solution I can think of is to compute let's say 100 quantiles, then use runif() to draw a random number Q between 1 and 100, and finally run runif() again to pull a random value from the quantile Q. Is there perhaps a better/more elegant way of doing this? Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Temporal Analysis of variable x; How to select the outlier threshold in R?
I'm not sure I understand. You have financial data and want to throw away some outliers?? Why would you ever do this? First of all, I'd suggest you pay close attention to what the data is trying to say. Maybe your distribution is not normal after all (see tests for normality etc). Maybe you shouldn't force your normality assumption upon the data. -Original Message- From: Melanie Vida [mailto:[EMAIL PROTECTED] Sent: Friday, February 25, 2005 1:30 PM To: r-help Subject: [R] Temporal Analysis of variable x; How to select the outlier threshold in R? For a financial data set with large variance, I'm trying to find the outlier threshold of one variable x over a two year period. I qqplot(x2001, x2002) and found a normal distribution. The latter part of the normal distribution did not look linear though. Is there a suitable method in R to find the outlier threshold of this variable from 2001 and 2002 in R? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] XML to data frame or list
Dear useRs, I have a simple/RTFM question about XML parsing. Given an XML file, such as (fragment) A100/A B23/B Ctrue/C how do I import it in a data frame or list, so that the values (100, 23, true) can be accessed through the names A, B and C? I installed the XML package and looked over the documentation... however after 20 minutes and a couple of tests I still don't know what I should start with. Can someone provide an example or point me to the appropriate function(s)? Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] XML to data frame or list
I managed to parse more complex XML files as well. The trick was to manually determine the position of the child nodes of interest, after which they can be parsed in a loop. For example: require(XML) doc - xmlTreeParse(file.xml,getDTD=T,addAttributeNamespaces=T) r - xmlRoot(doc) #find the nodes of interest r[[i]][[j]] #then read them xmldata - list(NULL) for (i in 1:xmlSize(r[[2]][[1]])) { xmldata[[i]] - as.data.frame(xmlSApply(r[[2]][[1]][[i]],xmlValue)) } --- Barry Rowlingson [EMAIL PROTECTED] wrote: Gabor Grothendieck wrote: You could check out the ctv package that was recently announced. It uses XML so its source would provide an example. If its a one-time operation, Excel reads XML and you could then use one of the many Excel to R possibilities. For an xml file like this: ?xml version=1.0? variables a100/a b23/b z666/z /variables its a one-liner with the XML package (library(XML)): xmlReadSimple - function(xmlFile){ as.list(xmlSApply(xmlRoot(xmlTreeParse(xmlFile)),xmlValue)) } add an lapply(...,as.numeric) for conversion to numbers. sweet. Baz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ Do You Yahoo!? http://mail.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Mandrake 10.1
I managed to install R 2.0.1 on Mandrake 10.1 a couple of weeks ago. It wasn't that easy, first I had to manually track, download and install 3-4 dependencies. I would suggest that you consider another GNU/Linux distribution, Mepis. Mepis combines the best features of several distributions: - You can run it from CD, like Knoppix/Quantian. - If you like it, you can easily install it on your hard drive (unlike Knoppix/Quantian). Just double click an icon and a graphical wizard will guide you through the installation steps. It's as easy to install as Mandrake, perhaps a bit easier (automatic hardware detection and configuration etc). - Package management is done automatically (no more annoying notifications from Mdk's urpmi - like sorry, can't do this, go ahead and figure it out by yourself). You can use Synaptic or apt-get, and you can install packages from the Debian testing and unstable repositories (which is great). Unlike Debian though, Mepis is much easier to install (imho). As someone who went through several failed installation attempts (Gentoo, Debian, Quantian), primarily due to hardware issues which I didn't have the patience to try to fix, I appreciate a lot what Mepis has to offer. You can have a complete system (R + packages etc) up and running in 30 minutes starting from scratch (assuming you have broadband) -- which is about what it would take you to fix the dependencies for just one binary (such as R-2.0.0-1mdk.i586.rpm) on Mandrake. hth, b. -Original Message- From: Christian [mailto:[EMAIL PROTECTED] Sent: Monday, March 14, 2005 1:21 PM To: r-help@stat.math.ethz.ch Subject: [R] Mandrake 10.1 Dear all, I am trying to install the R-2.0.0-1mdk.i586.rpm http://cran.planetmirror.com/bin/linux/mandrake/10.0/R-2.0.0-1mdk.i586.rpm file on mandrake 10.1. Since the file is, originally, meant for Mandrake 10.0, it is not surprising me that the installation does not work. The error message that I get can be translated in something like: impossible to install since the info is not satisfied. Could you please help me in installing R on my Mandrake 10.1? PS If you feel to answer me, consider that I am almost an absolute beginner at linux:) Thanks a lot Christian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Mandrake 10.1
--- Rau, Roland [EMAIL PROTECTED] wrote: -Original Message- From: r-help On Behalf Of bogdan romocea Sent: Tuesday, March 15, 2005 2:49 PM I would suggest that you consider another GNU/Linux distribution, I don't think it is necessary. Mandrake 10.1 is fine for running R.[1] I have Mandrake 10.1 (Community) at home running on my notebook and I was able to compile R without any problems - just using the software that was shipped with this distribution. It is certainly not necessary; even Windows is fine for running R. However, assuming R is not the only package to be installed and then upgraded, switching from something like Mandrake to something like Mepis may result in significant time savings, which is what I care about most. (Your mileage may vary.) I used Mdk for a couple of years and prefer to not remember how many hours I wasted on something as trivial as installing and upgrading packages. (Compilation will not save you always from having to manually upgrade other libraries, especially as your Mdk installation gets older.) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Basic questions about RMySQL
1. No way. You must have MySQL installed on your computer. In fact this is not true. You can use a MySQL server installed somewhere else on the network. --- bogdan romocea [EMAIL PROTECTED] wrote: 1. No way. You must have MySQL installed on your computer. 2. You must install the server. For details, see http://dev.mysql.com/doc/mysql/en/index.html . For portability, I would suggest that you run MySQL in the shell (ignore the GUIs) and save the syntax for adding users, creating tables etc. This will likely take more time when you first do it, but if you have to move to another computer later on, you can setup the new MySQL installation very quickly and easily. hth, b. -Original Message- From: De la Vega Góngora Jorge [mailto:[EMAIL PROTECTED] Sent: Friday, March 18, 2005 11:58 AM To: r-help@stat.math.ethz.ch Subject: [R] Basic questions about RMySQL Hello, Please forget me if I am asking something that is well documented. I have read documentation but there are points that are not clear for me. I am not expert in R nor Databases, but if someone direct me to a tutorial, I will appreciate it.. 1. In my understanding, I can install and use RMySQL withouth having to install MySQL in my PC, to have access to and to create new tables . Is this right? 2. I have created a c:\my.cnf file to access a database I have, but withouth installing the server, where I can define the user, password and host to establish a connection? Thanks in advance --- Jorge de la Vega Gongora | Telefono: (525) 5268 8379 Investigador | Fax: (525) 5268 8481 Banco de Mexico | email: [EMAIL PROTECTED] Planeación y Programación de Emisión | web: http://www.stat.umn.edu/~jvega Calzada Legaria 691 Módulo IV| Col. Irrigación 11500| __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ Do you Yahoo!? http://smallbusiness.yahoo.com/resources/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Basic questions about RMySQL
I certainly can't; I initially misunderstood the question. If connecting to MySQL is the problem, then you need to know the user ID, the domain and the password. Ask your DB administrator for help. Here's an example that works for me (local MySQL installation): require(DBI) require(RMySQL) MySQL(max.con = 16, fetch.default.rec = 5000, force.reload = F) drv - dbDriver(MySQL) con - dbConnect(drv,username=userid,password=pswd,dbname=db) dbListTables(con) --- Uwe Ligges [EMAIL PROTECTED] wrote: bogdan romocea wrote: 1. No way. You must have MySQL installed on your computer. 2. You must install the server. For details, see http://dev.mysql.com/doc/mysql/en/index.html . For portability, I would suggest that you run MySQL in the shell (ignore the GUIs) and save the syntax for adding users, creating tables etc. This will likely take more time when you first do it, but if you have to move to another computer later on, you can setup the new MySQL installation very quickly and easily. Can you tell us any reason why the server should run on the same machine R is running on? Uwe Ligges hth, b. -Original Message- From: De la Vega Góngora Jorge [mailto:[EMAIL PROTECTED] Sent: Friday, March 18, 2005 11:58 AM To: r-help@stat.math.ethz.ch Subject: [R] Basic questions about RMySQL Hello, Please forget me if I am asking something that is well documented. I have read documentation but there are points that are not clear for me. I am not expert in R nor Databases, but if someone direct me to a tutorial, I will appreciate it.. 1. In my understanding, I can install and use RMySQL withouth having to install MySQL in my PC, to have access to and to create new tables . Is this right? 2. I have created a c:\my.cnf file to access a database I have, but withouth installing the server, where I can define the user, password and host to establish a connection? Thanks in advance --- Jorge de la Vega Gongora | Telefono: (525) 5268 8379 Investigador | Fax: (525) 5268 8481 Banco de Mexico | email: [EMAIL PROTECTED] Planeación y Programación de Emisión | web: http://www.stat.umn.edu/~jvega Calzada Legaria 691 Módulo IV| Col. Irrigación 11500| __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Graphics (for goodness of fit) Question
In regards to your plot question, you could use points() or lines(): a - sample(1:50,10) b - sample(20:40,10) plot(1:10,a,pch=20,col=red) points(1:10,b,pch=20,col=blue) #or #lines(1:10,b,pch=20,col=blue,type=o) -Original Message- From: Mohammad Ehsanul Karim [mailto:[EMAIL PROTECTED] Sent: Sunday, March 20, 2005 10:46 AM To: r-help@stat.math.ethz.ch Subject: [R] Graphics (for goodness of fit) Question Dear List, Suppose, I have some observed and expected frequencies, such as following. I need to draw a graph where plots of observed and expected frequencies are merged into one. m - c(1,2,3,4,5,6,7,8,9,10,12,13,17) k - c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 19) ExpWW - c(0.309330628803245, 0.213645190887434, 0.147558189649435, 0.101913922060107, 0.0703888244654489, 0.0486154051328303, 0.0335771712935674, 0.0231907237838939, 0.0160171226134196, 0.0110625360037919, 0.00764055478558038, 0.00527709716935116, 0.000395627498345897) ExpDD - c(0.420249653259362, 0.243639882194748, 0.141250306182253, 0.0818899139863827, 0.0474757060281664, 0.0275240570315860, 0.0159570816077711, 0.00925112359507395, 0.00536334211198462, 0.00310939944911175, 0.00104510169329968, 0.00060589806906972, 6.84484529305126e-05) ObjDD - c(0.468646864686469, 0.198019801980198, 0.151815181518152, 0.0759075907590759, 0.0396039603960396, 0.0198019801980198, 0.0165016501650165, 0.0099009900990099, 0.0033003300330033, 0.0033003300330033, 0.0033003300330033, 0.0066006600660066, 0.0033003300330033) ObjWW - c(0.373770491803279, 0.150819672131148, 0.127868852459016, 0.0721311475409836, 0.0885245901639344, 0.0622950819672131, 0.039344262295082, 0.0327868852459016, 0.0360655737704918, 0.00327868852459016, 0.00655737704918033, 0.00327868852459016, 0.00327868852459016) par(mfrow=c(2,2)) plot(k,ObjWW, type=l) # Plot 1 plot(k,ExpWW, type=l) # Plot 2 plot(m,ObjDD, type=l) # Plot 3 plot(m,ExpDD, type=l) # Plot 4 # I need to see plot 1 and 2 in same axis, and plot 3 and 4 in another # (i.e., 3, 4 both in same axis too, but not with 1 and 2's). # How can i use different types of legends in the same graph?? sum(((ObjWW-ExpWW)^2)/ExpWW) # Chi-Squared Goodness of Fit Test sum(((ObjDD-ExpDD)^2)/ExpDD) # Chi-Squared Goodness of Fit Test # Also, is there any other convenient way of doing chi-squared goodness of fit test (any function or package may be, to do this directly)? # And how can i find the P-values of the respective chi-squared tests in R? Any suggestion, direction, references, help, replies will be highly appreciated. Thank you for your time. Mohammad Ehsanul Karim Web: http://snipurl.com/ehsan Institute of Statistical Reseach and Training University of Dhaka, Dhaka - 1000, Bangladesh __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Gmail invitation
You can also buy these things on Ebay. I noticed the supply about 2 months ago when I guess you would have made about $1-2 per invitation. The profit opportunity is much diminished now that the supply has greatly increased (it appears every gmail account was allocated 50 invitations instead of 5 a few weeks ago). By the way, how much do you charge? :-) -Original Message- From: Gorjanc Gregor [mailto:[EMAIL PROTECTED] Sent: Friday, March 25, 2005 1:32 PM To: r-help@stat.math.ethz.ch Subject: [R] Gmail invitation Hello R users! I just found out that I have 49 invitations for Gmail (gmail.google.com). I have been using it now for a while and is really nice. Don't forget 1 GB for free. I will invite those who respond to this mail by FIFO. -- Lep pozdrav / With regards, Gregor Gorjanc University of Ljubljana Biotechnical Faculty URI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Departmentemail: gregor.gorjanc at bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzalefax: +386 (0)1 72 17 888 Slovenia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] how to simulate a time series
Dear useRs, I want to simulate a time series (stationary; the distribution of values is skewed to the right; quite a few ARMA absolute standardized residuals above 2 - about 8% of them). Is this the right way to do it? # load(rdtb)#the time series summary(rdtb) Min. 1st Qu. Median Mean 3rd Qu. Max. -1.11800 -0.65010 -0.09091 0.30390 1.12500 2.67600 farma - arima(rdtb,order=c(1,0,1),include.mean=T) farma[[coef]] ar1ma1 intercept 0.58091575 0.02313803 0.30417062 sim - list(NULL) #simulated for (i in 1:5) { sim[[i]] - as.vector(arima.sim(list(ar=c(farma[[coef]][1]), ma=c(farma[[coef]][2])),n=length(rdtb),innov=rdtb)) } allsim - as.data.frame(sim) colnames(allsim) - paste(sim,1:5,sep=) all - cbind(rdtb,allsim) # I don't understand why the simulation runs generate virtually identical values: all[100:105,] rdtb sim1 sim2 sim3 sim4 sim5 100 2.3863636 1.065661 1.065661 1.065661 1.065661 1.065661 101 1.9318182 2.606093 2.606093 2.606093 2.606093 2.606093 102 2.2954545 3.854074 3.854074 3.854074 3.854074 3.854074 103 2.5882353 4.880240 4.880240 4.880240 4.880240 4.880240 104 2.0227273 4.917622 4.917622 4.917622 4.917622 4.917622 105 -0.1521739 2.751352 2.751352 2.751352 2.751352 2.751352 It appears I may be missing something (very) basic, but don't know what. Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] a R function for sort a data frame.
dfr - data.frame(sample(1:50,10),sample(1:50,10)) colnames(dfr) - c(a,b) dfr - dfr[order(dfr$a),] dfr - dfr[order(-dfr$a),] -Original Message- From: Mario Morales [mailto:[EMAIL PROTECTED] Sent: Thursday, March 31, 2005 10:23 PM To: r-help@stat.math.ethz.ch Subject: [R] a R function for sort a data frame. Is there a R function for sort a data frame by a variable ? I know sort a vector, but I don't know sort a data frame by a column. Can you help me ? the sort() function don't work with data frame. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Amount of memory under different OS
You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of RAM, the seller will probably warn you about Windows and recommend a suitable OS. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Saturday, April 02, 2005 12:48 PM To: r-help@stat.math.ethz.ch Subject: [R] Amount of memory under different OS Hi, I have a problem: I need to perform a very tough analysis, so I would like to buy a new computer with about 16 GB of RAM. Is it possible to use all this memory under Windows or have I to install other OS? Thanks, Marco __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ Show us what our next emoticon should look like. Join the fun. http://www.advision.webevents.yahoo.com/emoticontest __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] looking for a plot function
Dear useRs, I have a data frame and I want to plot all rows. Each row is represented as a line that links the values in each column. The plot looks like this: dfr - data.frame(A=sample(1:50,10),B=sample(1:50,10), C=sample(1:50,10),D=sample(1:50,10)) xa - 10*1:4 plot(c(10,40),c(0,50)) for (i in 1:nrow(dfr)) { lines(xa,dfr[i,],pch=20,type=o) } Things get more complicated because I want the columns to be rescaled so as to fit nicely on a graph (for example if A has values between 0 and 100 but B has values between 100 and 1000, then rescale A or B), labels etc. Is there a function that can do plots like this? Thank you, b. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Considering port of SAS application to R
Forget about R for now and port the application to MySQL/PostgreSQL etc, it is possible and worthwhile. In case you happen to use (and really need) some SAS DATA STEP looping features you might be forced to look into SQL cursors, otherwise the port should be (very) straightforward. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Werner Wernersen Sent: Friday, April 21, 2006 7:09 AM To: r-help@stat.math.ethz.ch Subject: [R] Considering port of SAS application to R Hi there! I am considering to port a SAS application to R and I would like to hear your opinion if you think this is possible and worthwhile. SAS is mainly used to do data management and then to do some aggregations and simple computations on the data and to output a modified data set. The main problem I see is the size of the data file. As I have no access to SAS yet I cannot give real details but the SAS data file is about 7 gigabytes large. (It's only the basic SAS system without any additional modules) What do you think, would a port to R be possible with reasonable effort? Is R able to handle that size of data? Or is R prepared to work together with some database system? Thanks for your thoughts! Best regards, Werner - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Need R code
Here's an example. lst - list() for (i in 1:5) { lst[[i]] - data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE)) colnames(lst[[i]])[2] - paste(x,i,sep=) } dfr - lst[[1]] for (i in 2:length(lst)) dfr - merge(dfr,lst[[i]],all=TRUE) dfr - dfr[order(dfr[,1]),] print(dfr) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of stat stat Sent: Thursday, April 20, 2006 1:15 AM To: r-help@stat.math.ethz.ch Subject: [R] Need R code Dear r-users, Suppose I have three datasets: Dataset-1: Date x y Jan-1,2005120 230 Jan-2,2005123 -125 Jan-3,2005-110 300 Jan-4,2005114 -21 Jan-7,200511299 Mar-5,2005200 311 Dataset-2: Date x y Jan-2,2005123 -125 Jan-3,2005-110 300 Jan-4,2005114 -21 Jan-5,200511299 Jan-6,2005-23 12 Mar-5,2005200 311 Dataset-3: Date x y Jan-3,2005-110 300 Jan-4,2005114 -21 Jan-5,200511299 Mar-5,2005200 311 Apl-23,2005 123 200 Now I want to get the common dates along with x and y from this above three datasets keeping the same order in date-variable as it is. For ex. I want to get: Datex y xy x y (from dataset-1) (from dataset-2) (from dataset-3) -- -- Jan-3,2005-110 300 -110 300 -110 300 Jan-4,2005 114 -21 114-21 114 -21 Mar-5,2005200 311 200 311 200 311 Can anyone give me any R code to implement this for any number of datasets ? Thanks and regards thanks in advance - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] regression modeling
There is an aspect, worthy of careful consideration, you don't seem to be aware of. I'll ask the question for you: How does the explanatory/predictive potential of a dataset vary as the dataset gets larger and larger? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi Sent: Monday, April 24, 2006 12:45 PM To: r-help Subject: [R] regression modeling Hi, there: I am looking for a regression modeling (like regression trees) approach for a large-scale industry dataset. Any suggestion on a package from R or from other sources which has a decent accuracy and scalability? Any recommendation from experience is highly appreciated. Thanks, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] www.r-project.org
I agree it would be worthwhile to make some cosmetic changes to r-project.org (nothing fancy though - no javascript, Flash etc). The general public may not be fully aware of how R compares to other statistical software, and I doubt that a web site which looks like it was put together 10 years ago helps bend the perceptions in the right direction. (Also, can someone finally change the graph on the first page??) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of roger bos Sent: Tuesday, April 25, 2006 1:09 PM To: Romain Francois Cc: RHELP Subject: Re: [R] www.r-project.org While there is nothing about the r-project site that I would consider fancy, it is pretty functional. I would be interested to hear more about what you hope to accomplish by re-doing the web site. Fancy graphics may just slow down the experience for those not on broadband. After all, the r-help list doesn't even like HTML in email, so it may not like too many fancy stuff on their website either. On 4/25/06, Romain Francois [EMAIL PROTECTED] wrote: Dear R users and developpers, My question is adressed to both of you, so I choose R-help to post it. Are there any plans to jazz up the main R website : http://www.r-project.org The look it have now is the same for a long time and kind of sad compared to other statistical package's website. Of course, the comparison is not fair, since companies are paying web designers to draw lollipop websites ... My first idea was to organize some kind of web designing contest. But, I had a small talk with Friedrich Leisch about that, who said that I shouldn't expect too many competitors. So, what about creating a small team, create a home page project and then propose it to the core team. It goes without saying it : The core team has the final word. What do you think ? Who would like to play ? Romain -- visit the R Graph Gallery : http://addictedtor.free.fr/graphiques mixmod 1.7 is released : http://www-math.univ-fcomte.fr/mixmod/index.php +---+ | Romain FRANCOIS - http://francoisromain.free.fr | | Doctorant INRIA Futurs / EDF | +---+ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] efficiency in merging two data frames
Another good option is SQL, the fastest and most scalable solution. If you decide to give it a try pay close attention to indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller Sent: Monday, May 01, 2006 8:55 AM To: 'Guojun Zhu'; r-help@stat.math.ethz.ch Subject: Re: [R] efficiency in merging two data frames I'm sure you'll get ingenious responses to help you optimize your R code. I deal with similar investment data in even larger numbers (e.g. 10 years of daily return data for each stock in the Russell 3000), and prefer reading and consolidating the data in Python using dictionaries and lists, then either piping the data to R in a read statement (read.table(pipe python...)) or using Rpy to write R data frames directly from Python. Python is more facile with these basic data manipulations for hundreds of thousands or even millions of records, and performance is generally considerably better. Steve Miller -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Guojun Zhu Sent: Monday, May 01, 2006 2:35 AM To: r-help@stat.math.ethz.ch Subject: [R] efficiency in merging two data frames I have two data sets about lots of companies' stock and fiscal data. One is monthly data with about 144,000 lines, and the other is quaterly with about 56,000. Each data set takes different company code. I need to merge these two together. I read both ask cvs. And the other file with corresponding firm code. Now I have three data sets. return$PERMNO, account$GVKEY. id is the data frames of the corresponding relation and has both id$PERMNO and id$GVKEY. Also, I need to convert the return's month into quarter and finally merge two data frames(return and account). I end up write a short program for this, but it runs very slow. 15+ minutes. Is there quick way to do it. Here is my original codes. id$fy=rep(0,length(id$PERMNO)) for (i in 1:length(id$PERMNO)) id$fy[[i]]-account$FYR[id$GVKEY[[i]]==account$GVKEY][[1]] return$GVKEY=rep(0,length(return$PERMNO)) return$fyy=rep(0,length(return$PERMNO)) return$fyq=rep(0,length(return$PERMNO)) for (i in i:length(return$PERMNO)) { temp-id$PERMNO==return$PERMNO[[i]]; tempmon-id$fy[temp][[1]]; if (return$month[[i]]-tempmon) { return$fyy[[i]]-return$year[[i]]; return$fyq[[i]]-4-(tempmon-return$month[[i]])%/%3; } else{ return$fyy[[i]]-return$year[[i]]+1; return$fyq[[i]]-(return$month[[i]]-tempmon-1)%/%3; } return$GVKEY[[i]]-id$GVKEY[temp][[1]]; } returnnew=merge(return,account,by.x-c(GVKEY,fyy,fyq),by .y-c(GVKEY, fyy,fyq)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Axis labels
plot(1:10,axes=FALSE) axis(1,at=1:10,labels=10:1) axis(2,at=1:10,labels=5*10:1) box() -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Brown Sent: Tuesday, May 02, 2006 12:13 PM To: r-help@stat.math.ethz.ch Subject: [R] Axis labels I cannot find a way to apply custom axis tick label text. Is there a way? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Listing Variables
Here's an example. dfr - data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50) vars - colnames(dfr) for (v in vars[grep(B,vars)]) print(mean(dfr[,v])) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Farrel Buchinsky Sent: Wednesday, May 03, 2006 10:46 AM To: r-help@stat.math.ethz.ch Subject: [R] Listing Variables How does one create a vector whose contents is the list of variables in a dataframe pertaining to a particular pattern? This is so simple but I cannot find a straightforward answer. I want to be able to pass the contents of that list to a for loop. So let us assume that one has a dataframe whose name is Data. And let us assume one had the height of a group of people measured at various ages. It could be made up of vectors Data$PersonalID, Data$FirstName, Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9, Data$Height.10,Data$Height.12,Data$Height.20many many more variables. How would one create a vector of all the Height variable names. The simple workaround is to not bother creating the vector Data$Height.1 Data$Height.5 Data$Height.9 Data$Height.10 Data$Height.12Data$Height.20...but rather just to use the sapply function. However with some functions the sapply will not work and it is necessary to supply each variable name to a function (see thread at Repeating tdt function on thousands of variables) This is such a core capability. I would like to see it in the R-Wiki but could not find it there. -- Farrel Buchinsky, MD Pediatric Otolaryngologist Allegheny General Hospital Pittsburgh, PA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] SQL like manipulations on data frames
This goes the other way - all SQL manipulations are a subset of what can be done with R. Read up on indexing and see ?merge, ?aggregate, ?by, ?tapply, among others. (For the R equivalent to your query, check ?grep and ?order, and search the list if needed.) Also, this example might be a good start: gby - function(var,BY,byname=BY) { if (!exists(summarize)) library(Hmisc)#you need to install Hmisc grouped - summarize(var,BY,function(x) {c(count=length(x),min=min(x), max=max(x),mean=mean(x))}) colnames(grouped) - c(byname,count,min,max,mean) grouped } #--- x - rnorm(1000) state - sample(c(A,B,C,D),1000,replace=TRUE) city - sample(1:5,1000,replace=TRUE) gby(x,paste(state,city,sep=-),State-City) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Citek Sent: Thursday, May 04, 2006 6:56 PM To: r-help@stat.math.ethz.ch Subject: [R] SQL like manipulations on data frames Is there a cheat-sheet anywhere that describes how to do SQL-like manipulations on a data frame? My knowledge of R is rather limited. But from my experience it seems as though one can think of data frames as being similar to tables in a database: there are rows, columns, and values. Also, one can perform similar manipulations on a data frame as one can on a table. For example: select * from foo where bar 10 ; is similar to foo[foo[bar] 10,] I'm just wondering how many other SQL-like manipulations can be done on a data frame? As an extreme example, is it reasonable to assume there is an R equivalent to: select bar, bat, baz, baz*100 as 'pctbaz' from foo where bar like %xyz % order by bat, baz desc ; Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Using DBI and RMySQL
I'll see if I can reproduce the steps under Knoppix[1]. Then you can run Knoppix with a Persistent Disk Image (PDI)[2] that contains R, the DBI, and RMySQL on just about any machine that runs Knoppix. Don't bother, it's been done already. See http://dirk.eddelbuettel.com/quantian.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Citek Sent: Thursday, May 11, 2006 11:08 AM To: R-help@stat.math.ethz.ch Subject: Re: [R] Using DBI and RMySQL On May 11, 2006, at 3:09 AM, Indrajit Sengupta wrote: Did you create RMySQL windows binary in the process? Sorry, but no. This was done on Mac OS X. And was done a while ago. Can you share it with me? Wish I could, but I can't. I don't have a Windows machine. I'll see if I can reproduce the steps under Knoppix[1]. Then you can run Knoppix with a Persistent Disk Image (PDI)[2] that contains R, the DBI, and RMySQL on just about any machine that runs Knoppix. [1] http://knoppix.net/ [2] http://knoppix.net/wiki/ Customizing_environment_using_4.0.2CD#Persistent_Disk_Image_.28PDI.29 Regards, - Robert http://www.cwelug.org/downloads Help others get OpenSource software. Distribute FLOSS for Windows, Linux, *BSD, and MacOS X with BitTorrent __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Fast update of a lot of records in a database?
Your approach seems very inefficient - it looks like you're executing thousands of update statements. Try something like this instead: #---build a table 'updates' (id and value) ... #---do all updates via a single left join UPDATE bigtable a LEFT JOIN updates b ON a.id = b.id SET a.col1 = b.value; You may need to adjust the syntax. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Duncan Murdoch Sent: Friday, May 19, 2006 11:17 AM To: r-help@stat.math.ethz.ch Subject: [R] Fast update of a lot of records in a database? We have a PostgreSQL table with about 40 records in it. Using either RODBC or RdbiPgSQL, what is the fastest way to update one (or a few) column(s) in a large collection of records? Currently we're sending sql like BEGIN UPDATE table SET col1=value WHERE id=id (repeated thousands of times for different ids) COMMIT and this takes hours to complete. Surely there must be a quicker way? Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] win2k memory problem with merge()'ing repeatedly (long email)
Repeated merge()-ing does not always increase the space requirements linearly. Keep in mind that a join between two tables where the same value appears M and N times will produce M*N rows for that particular value. My guess is that the number of rows in atot explodes because you have some duplicate values in your files (having the same duplicate date in each data frame would cause atot to contain 4, then 8, 16, 32, 64... rows for that date). -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean O'Riordain Sent: Monday, May 22, 2006 10:12 AM To: r-help Subject: [R] win2k memory problem with merge()'ing repeatedly (long email) Good afternoon, I have a 63 small .csv files which I process daily, and until two weeks ago they processed just fine and only took a matter of moments and had non noticeable memory problem. Two weeks ago they have reached 318 lines and my script broke. There are some missing-values in some of the files. I have tried hard many times over the last two weeks to create a small repeatable example to give you but I've failed - unless I use my data it works fine... :-( Am I missing something obvious? (again) A line in a typical file has lines which look like : 01/06/2005,1372 Though there are three files which have two values (files 3,32,33) and these have lines which look like... 01/06/2005,1766, or 15/05/2006,289,114 a1 - read.csv(file1.csv,header=F) etc... a63 - read.csv(file63.csv,header=F) names(a1) - c(mdate,file1.column.description) atot - merge(a1,a2,all=T) followed by repeatedly doing... atot - merge(atot, a3,all=T) atot - merge(atot, a4,all=T) etc... I normally start R with --vanilla. What appears to happen is that atot doubles in size each iteration and just falls over due to lack of memory at about i=17... even though the total memory required for all of these individual a1...a63 is only 1001384 bytes (doing an object.size() on a1..a63) at this point I've been trying to pin down this problem for two weeks and I just gave up... The following works fine as I'd expect with minimal memory usage... for (i in 3:67) { datelist - as.Date(start.date)+0:(count-1) #remove a couple of elements... datelist - datelist[-(floor(runif(nacount)*count))] a2 - as.data.frame(datelist) names(a2) - mdate vname - paste(value, i, sep=) a2[vname] - runif(length(datelist)) #a2[floor(runif(nacount)*count), vname] - NA # atot - merge(atot,a2,all=T) i - 2 a.eval.text - paste(merge(atot, a, i, , all=T), sep=) cat(a.eval.text is: -, a.eval.text, -\n, sep=) atot - eval(parse(text=a.eval.text)) cat(i:, i, , gc(), \n) } this works fine... but on my files (as per attached 'lastsave.txt' file) it just gobbles memory. Am I doing something wrong? I (wrongly?) expected that repeatedly merge(atot,aN) would only increase the memory requirement linearly (with jumps perhaps as we go through a 2^n boundary)... which is what happens when merging simulated data.frames as above... no problem at all and its really fast... The attached text file shows a (slightly edited) session where the memory required by the merge() operation just doubles with each use... and I can only allow it to run until i=17!!! I've even run it with gctorture() set on... with similar, but excruciatingly slow results... Is there any relevant info that I'm missing? Unfortunately I am not able to post the contents of the files to a public list like this... As per a previous thread, I know that I can use a list to handle these dataframes - but I had difficulty with the syntax of a list of dataframes... I'd like to know why the memory requirements for this merge just explode... cheers, (and thanks in advance!) Sean O'Riordain == version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status Patched major 2 minor 3.0 year 2006 month 05 day09 svn rev38014 language R version.string Version 2.3.0 Patched (2006-05-09 r38014) Running on Win2k with 1Gb ram. I also tried it (with the same results) on 2.2.1 and 2.3.0. R : Copyright 2006, The R Foundation for Statistical Computing Version 2.3.0 Patched (2006-05-09 r38014) ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()'
Re: [R] Manipulating code?
Macro stuff à la SAS is something that should be avoided whenever possible - it's messy, limited, and limiting. (I've done it ocasionally and it works, but I think it's best not to go there.) Read the documentation on lists (in particular named lists), and keep everything in one or more lists. For example: lst - list() for (v in c(var1,var2,var3)) lst[[v]] - runif(sample(c(50,100),1)) for (v in c(var1,var2,var3)) print(sd(lst[[v]])) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Johannes Hüsing Sent: Tuesday, May 23, 2006 12:26 PM To: r-help@stat.math.ethz.ch Subject: [R] Manipulating code? Dear expeRts, I am currently struggling with the problem of finding cut points for a set of stimulus variables. I would like to obtain cut points iteratively for each variable by re-applying a dichotomised variable in the model and then recalculate it. I planned to have fixed names for the dichotomised variables so I could use the same syntax for every recalculation of the whole model. I furthermore want to reiterate the process until no cut point changes any more. My problem is in accomplishing this syntactically. How can I pass a variable name to a function without getting lost in as.symbol and eval and parse mayhem? I am feeling I am thinking too much in macro expansion à la SAS when trying to tackle this. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] progressive slowdown during script execution?
Compare system.time({ v - vector() for (i in 1:10^5) v - c(v,1) }) with system.time({ v - vector(length=10^5) for (i in 1:10^5) v[i] - 1 }) If you don't know exactly how long v will be, use a value that's large enough, then throw away what's extra. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Alcon Sent: Thursday, June 01, 2006 2:04 PM To: r-help@stat.math.ethz.ch Subject: [R] progressive slowdown during script execution? I'm an R novice, so I hope my question is a valid one. I'm trying to run the following script in the current version of R. for (i in 1:1640){for (j in (i+1):1641){ if (i == 1 j == 2){x - cor(sage[i,],sage[j,],method=spearman); y - cor(frie[i,],frie[j,],method=spearman)} if (i != 1 || j != 2){x - c(x,cor(sage[i,],sage[j,],method=spearman)); y - c(y,cor(frie[i,],frie[j,],method=spearman))}}} It basically just finds all pairwise correlations of the rows in a matrix for each of two matrices and stores the results for each matrix in a vector. The problem I seem to be running into is that it seems to slow way down during execution somehow. When I first tried running it I stopped execution to see how fast it was running, before trying to compute the whole job (the two matrices each have 1641 rows). Based on what I saw, I figured it would easily finish overnight. Instead, it was still running almost 24 hours later. To quantify this a little better I checked it after running for 5 minutes, at which point it had added 79120 correlations to each of the x and y vectors. Since there should be a total of (1641*1640)/2 = 1345620 pairwise correlations in each vector when it finishes running, I worked out that it should take (1345620/79120)*5 = 85 minutes to run the whole job. However, when I checked it after running for 2 hours, it had added only 341870 correlations to each vector. Any ideas what I'm doing wrong, or why it would run more slowly the longer it runs? Thanks for any help or advice. Tim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R usage for log analysis
I wouldn't use a DBMS at all -- it is not necessary and I don't see what you would get in return. Instead I would split very large log files into a number of pieces so that each piece fits in memory (see below for an example), then process them in a loop. See the list and the documentation if you have questions about how to read text files, count strings etc. #---split big files in two--- for F in `ls *log` do fn=`echo $F | awk -F\. '{print $1}'` ln=`wc -l $F | awk '{print $1}'` #number of lines in the file forsplit=`expr $ln / 2 + 50` #no. of lines in each chunk, tweak as needed echo Splitting $F into pieces of $forsplit lines each split -l $forsplit $F $fn done -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabriel Diaz Sent: Monday, June 12, 2006 9:52 AM To: Jean-Luc Fontaine Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R usage for log analysis Hello Thanks all for the answers. I'm taking an overview to the project documentation, and seems the database is the way to go to handle log files of GB order (normally between 2 and 4 GB each 15 day dump). In this document http://cran.r-project.org/doc/manuals/R-data.html, says R will load all data into memory to process it when using read.table and such. Using a database will do the same? Well, currently i have no machine with 2 GB of memory. The moodss thing looks nice, thanks for the link. But what i have to do now is an offline analysis of big log files :-). I will try to go with the mysql - R way. gabi On 6/12/06, Jean-Luc Fontaine [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Allen S. Rout wrote: Don't expect a warm welcome. This community is like all open-source communities, sharply focused on its' own concerns and expertise. And, in an unusual experience for computer types, our core competencies hold little or no sway here; they don't even give us much of a leg up. Just wait 'till you want to do something nutso like produce a business graphic. :) I'm working on understanding enough of R packaging and documentation to begin a 'task view' focused on systems administration, for humble submission. That might end up being mostly log analysis; the term can describe much of what we do, if it's stretched a bit. I'm hoping the task view will attract the teeming masses of sysadmins trapped in the mire of Gnuplot and friends. Although not specifically solving the problem at hand, you might want to take a look at moodss and moomps (http://moodss.sourceforge.net/), modular monitoring applications, which uses R (http://jfontain.free.fr/statistics.htm) and its log module (http://jfontain.free.fr/log/log.htm). - -- Jean-Luc Fontaine http://jfontain.free.fr/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org iD8DBQFEjT2ykG/MMvcT1qQRAuF6AJ9nf5phV/GMmCHPuc5bVyA+SoXqGACgnLuZ u1tZpFOTCHNKOfFLZOC9uXI= =V8yo -END PGP SIGNATURE- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] bubbleplot for matrix
Here's an example. By the way, I find that it's more convenient (where applicable) to keep the data in 3 vectors/factors rather than one matrix/data frame. a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) x - y - z - vector() for (i in 1:nrow(a)) { x - c(x,rep(rownames(a)[i],ncol(a))) y - c(y,colnames(a)) z - c(z,a[i,]) } symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki) text(as.numeric(x),as.numeric(y),labels=z) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Albert Vilella Sent: Tuesday, June 13, 2006 7:11 AM To: r-help@stat.math.ethz.ch Subject: [R] bubbleplot for matrix Hi all, I would like to ask if it is possible to use bubbleplot for a 20x20 matrix, instead of a dataframe with factors in columns. The idea would be to get a tabular representation with bubbles like in Rnews_2006_2 article, which look very nice. Thanks in advance, Albert. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] bubbleplot for matrix
This works, though I'm not sure why symbols() complains about axes=FALSE while fulfilling the request. a - matrix(sample(1:5,100,replace=TRUE),nrow=10) rownames(a) - c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd) colnames(a) - c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd) x - y - z - vector() for (i in 1:nrow(a)) { x - c(x,rep(rownames(a)[i],ncol(a))) y - c(y,colnames(a)) z - c(z,a[i,]) } xp - as.numeric(as.factor(x)) yp - as.numeric(as.factor(y)) symbols(xp,yp,z,inches=0.2,bg=khaki,axes=FALSE) axis(1,at=1:length(unique(x)),labels=sort(unique(x))) axis(2,at=1:length(unique(y)),labels=sort(unique(y))) box() text(xp,yp,labels=z) On 6/15/06, Albert Vilella [EMAIL PROTECTED] wrote: Thanks Bogdan for the reply, I almost got it working, but in my case, the rownames and colnames are strings, not numbers, and I guess that this is a problem when using your snippet: a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) rownames(a) = c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd) colnames(a) = c(aed,fde,fda,yxj,ijk,ddd,gcd,sbe,adc,asd) x - y - z - vector() for (i in 1:nrow(a)) { x - c(x,rep(rownames(a)[i],ncol(a))) y - c(y,colnames(a)) z - c(z,a[i,]) } symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki) text(as.numeric(x),as.numeric(y),labels=z) symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki) Error in plot.window(xlim, ylim, log, asp, ...) : need finite 'xlim' values In addition: Warning messages: 1: NAs introduced by coercion 2: NAs introduced by coercion 3: no finite arguments to min; returning Inf 4: no finite arguments to max; returning -Inf 5: no finite arguments to min; returning Inf 6: no finite arguments to max; returning -Inf Any guess? Thanks in advance, Albert. On Wed, 2006-06-14 at 16:47 -0400, bogdan romocea wrote: Here's an example. By the way, I find that it's more convenient (where applicable) to keep the data in 3 vectors/factors rather than one matrix/data frame. a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) x - y - z - vector() for (i in 1:nrow(a)) { x - c(x,rep(rownames(a)[i],ncol(a))) y - c(y,colnames(a)) z - c(z,a[i,]) } symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg=khaki) text(as.numeric(x),as.numeric(y),labels=z) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Albert Vilella Sent: Tuesday, June 13, 2006 7:11 AM To: r-help@stat.math.ethz.ch Subject: [R] bubbleplot for matrix Hi all, I would like to ask if it is possible to use bubbleplot for a 20x20 matrix, instead of a dataframe with factors in columns. The idea would be to get a tabular representation with bubbles like in Rnews_2006_2 article, which look very nice. Thanks in advance, Albert. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] modeling logit(y/n) using lrm
Not sure about your data set, but if you have some kind of (weighted/stratified) sample of hospitals you need to pay special attention. Survey data violates the assumptions of the classical linear models (infinite population, identically distributed errors etc) and needs to be analyzed differently. In SAS, it's wrong to throw such data into a PROC LOGISTIC / REG; PROC SURVEYLOGISTIC / SURVEYREG should be used instead. In R, take a look at the survey package. For details check http://www2.sas.com/proceedings/sugi31/193-31.pdf -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hamilton, Cody Sent: Friday, June 16, 2006 1:32 PM To: r-help@stat.math.ethz.ch Subject: [R] modeling logit(y/n) using lrm I have a dataset at a hospital level (as opposed to the patient level) that contains number of patients experiencing events (call this number y), and the number of patients eligible for such events (call this number n). I am trying to model logit(y/n) = XBeta. In SAS this can be done in PROC LOGISTIC or GENMOD with a model statement such as: model y/n = predictors;. Can this be done using lrm from the Hmisc library without restructuring the dataset so that for each hospital there is one row with y = 1 and one row with y = 0 and then using the weight option in lrm to weight these two responses by the number of 'successes' and 'failures' for that hospital, respectively? I would like to avoid the restructuring, and I understand that the use of the weight function is not compatible with a lot of the validation functions available in Hmisc (validate, bootcov, etc.). Cody Hamilton, Ph.D Institute for Health Care Research and Improvement Baylor Health Care System (214) 265-3618 This e-mail, facsimile, or letter and any files or attachments transmitted with it contains information that is confidential and privileged. This information is intended only for the use of the individual(s) and entity(ies) to whom it is addressed. If you are the intended recipient, further disclosures are prohibited without proper authorization. If you are not the intended recipient, any disclosure, copying, printing, or use of this information is strictly prohibited and possibly a violation of federal or state law and regulations. If you have received this information in error, please notify Baylor Health Care System immediately at 1-866-402-1661 or via e-mail at [EMAIL PROTECTED] Baylor Health Care System, its subsidiaries, and affiliates hereby claim all applicable privileges related to this information. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] print color
One option is library(R2HTML) ?HTML.cormat The thing you're after is traffic highlighting (via CSS or HTML tags). If HTML.cormat() doesn't do exactly what you want, modify the source code. (By the way, I haven't used R2HTML so far so maybe there's a more appropriate function.) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robert Mcfadden Sent: Monday, July 10, 2006 4:00 PM To: R-help@stat.math.ethz.ch Subject: [R] print color Dear R Users, Is it possible to make R print the largest item in each row of a matrix X with red font? Example: 1247 8431 ... Therefore 7 and 8 should be in red color. I would appreciate any suggestion Robert McFadden [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is it possible to only read a subset by read.table ?
It's possible and straightforward (just don't use R). IMHO the GNU Core Utilities http://www.gnu.org/software/coreutils/ plus a few other tools such as sed, awk, grep etc are much more appropriate than R for processing massive text files. (Get a good book about UNIX shell scripting. On Windows you can use Services For Unix or Cygwin.) Also, here's an example that you could adapt to print the males from your data set to a separate file, which you could then import in R. #---print specific lines to another file--- suffix=_JAN06 for F in `ls *data*` do echo $F sed -n -e '/2006-01-[0-9][0-9]/p' $F ${F}${suffix} done -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Vonka Sent: Wednesday, July 12, 2006 8:37 AM To: r-help@stat.math.ethz.ch Subject: [R] Is it possible to only read a subset by read.table ? Hello, is it possible to do something like DATA - read.table(file=blabla.dat,subset=(sex==male)), i.e. make R read only a subset of a csv file ? I think it would be useful in case of very big datasets, but I can't find such a feature. Thanks for an answer, David Vonka __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 15-min mean values
Here's another approach which can be easily implemented in SQL. 1. Start with the dates as character vectors, dt - as.character(Sys.time()) 2. Extract the minutes and round them to 0,15,30,45: minutes - floor(as.numeric(substr(dt,15,16))/15)*15 final.mins - as.character(minutes) final.mins[final.mins == 0] - 00 3. Get the dates you need for aggregating: final.dt - paste(substr(dt,1,14),final.mins,:00,sep=) (If you had wanted to use 10 minutes, it would have been enough to transform MM:SS to M0:00.) 4. Use aggregate(), SQL GROUP BY etc 5. Finally, convert final.dt from character to datettime. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Thursday, February 02, 2006 1:44 AM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: Re: [R] 15-min mean values Assume VDATE is a character vector. If its a factor first convert it using VDATE - as.character(VDATE) Lets assume we only need the times portion and later we handle the full case which may or may be needed. We create a times object from the times portion of vdate and then in the aggregate statement we use trunc.times -- note that trunc.times is a recent addition to the chron package so make sure you have the latest chron and R 2.2.1.See ?trunc.times # test data library(chron) library(zoo) VDATE - c(1998-10-22:02:11, 1998-10-22:02:12, 1998-10-22:02:13, 1998-10-22:02:14, 1998-10-22:02:15) WS - c(12.5, 10.1, 11.2, 10.5, 11.5) # convert VDATES to times class and aggregate vtimes - times(sub(.*:(..:..), \\1:00, VDATE)) aggregate(zoo(WS), trunc(vtimes, 00:15:00), mean) If we need the day part too then its only a little harder. Represent VDATE as a chron object, vdate. We do this by extracting out the date and time portions and converting each separately. We use regular expressions to do that conversion but show in a comment how to do it without regular expressions. See R News 4/1 Help Desk for more info on this and the table at the end of the article in particular. # alternative way to convert to vdate would be: # vdate - chron(dates = as.numeric(as.Date(substring(VDATE, 1, 10))), #times = paste(substring(VDATE, 12), 0, sep =:)) vdate - chron(dates = sub(()-(..)-(..).*, \\2/\\3/\\1, VDATE), times = sub(.*:(..:..), \\1:00, VDATE)) aggregate(zoo(WS), chron(trunc(times(vdate), 00:15:00)), mean) On 2/2/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Good day everyone, I want to use zoo(aggregate) to calculate 15-min mean values from a wind dataset which has 1-min values. The data I have looks like this: vector VDATE vector WS 1 1998-10-22:02:11 12.5 2 1998-10-22:02:12 10.1 3 1998-10-22:02:13 11.2 4 1998-10-22:02:14 10.5 5 1998-10-22:02:15 11.5 . . . n 2005-06-30:23:59 9.1 I want to use: aggregate(zoo(WS),'in 15-min intervals',mean) How do you specify 'in 15-min intervals' using vector VDATE? The length of VDATE cannot be changed, otherwise it would be a trivial problem because I can generate a 15-min spaced vector using 'seq'. Am I missing something? Thanks a lot, Augusto Augusto Sanabria. MSc, PhD. Mathematical Modeller Risk Research Group Geospatial Earth Monitoring Division Geoscience Australia (www.ga.gov.au) Cnr. Jerrabomberra Av. Hindmarsh Dr. Symonston ACT 2609 Ph. (02) 6249-9155 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] matching tables
t1 - as.data.frame(table(1:10)) ; colnames(t1)[2] - A t2 - as.data.frame(table(5:20)) ; colnames(t2)[2] - B t3 - merge(t1,t2,all=TRUE) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante Sent: Tuesday, February 07, 2006 4:22 PM To: r-help@stat.math.ethz.ch Subject: [R] matching tables Dear Listers, I am trying to match tables that DO NOT have the same length. The tables result from the function table() so they look like this: table 1 2 3 4 3 5 7 table 2 1 2 3 6 4 5 I need the following output: (NOTICE THE ZEROS) 1 2 3 4 table1 0 3 5 7 table2 6 4 5 0 Unfortunately, I was not successful using match(). Previous postings explain how to do similar matching, but for tables for same length, specifically. Any thoughts ? Thanks ! eric Eric Pante College of Charleston, Grice Marine Laboratory 205 Fort Johnson Road, Charleston SC 29412 Phone: 843-953-9190 (lab) -9200 (main office) On ne force pas la curiosite, on l'eveille ... Daniel Pennac __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] dataframe subset
Here's one way, x - data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10)) y - data.frame(V=c(2,9,10)) xy - merge(x,y,all=FALSE) Pay close attention to what happens if you have duplicate values in y, say y - data.frame(V=c(2,9,10,10)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bernhard Baumgartner Sent: Wednesday, February 08, 2006 9:22 AM To: r-help@stat.math.ethz.ch Subject: [R] dataframe subset I have a dataframe with a column, say x consisting of values, each value appearing different times, e.g. x: 1,1,1,1,2,2,4,4,4,9,10,10,10,10,10 ... and a vector, including e.g.: y: 2,9,10,... I need a subset of the dataframe: all rows where x is equal to one of the values in y. Currently I use a loop for this, but because x and y are large this is very slow. Is there any idea how to solve this problem faster? Thank you, Bernhard __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Interleaving elements of two vectors?
For a general solution without warnings try interleave - function(v1,v2) { ord1 - 2*(1:length(v1))-1 ord2 - 2*(1:length(v2)) c(v1,v2)[order(c(ord1,ord2))] } interleave(rep(1,5),rep(3,8)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Monday, March 06, 2006 12:12 AM To: Ajay Narottam Shah Cc: R-help Subject: Re: [R] Interleaving elements of two vectors? Try this (note that your x and y do not have the same length and in this case the expression will recycle the shorter one and give a warning): z - c(rbind(x, y)) On 3/5/06, Ajay Narottam Shah [EMAIL PROTECTED] wrote: Suppose one has x - c(1, 2, 7, 9, 14) y - c(71, 72, 77) How would one write an R function which alternates between elements of one vector and the next? In other words, one wants z - c(x[1], y[1], x[2], y[2], x[3], y[3], x[4], y[4], x[5], y[5]) I couldn't think of a clever and general way to write this. I am aware of gdata::interleave() but it deals with interleaving rows of a data frame, not elems of vectors. -- Ajay Shah http://www.mayin.org/ajayshah [EMAIL PROTECTED] http://ajayshahblog.blogspot.com *(:-? - wizard who doesn't know the answer. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] \r with RSQLite
\r is a carriage return character which some editors may use as a line terminator when writing files. My guess is that RSQLite writes your data frame to a temp file using \r as a line terminator and then runs a script to have SQLite import the data (together with \r - this would be the problem), but I have no idea if that's really the case. Check the documentation or ask the maintainer. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mikkel Grum Sent: Wednesday, March 15, 2006 1:46 PM To: r-help@stat.math.ethz.ch Cc: [EMAIL PROTECTED] Subject: [R] \r with RSQLite What am I doing wrong, or is the \r that I'm getting in the example below a bug? a - (1:10) b - (LETTERS[1:10]) df - as.data.frame(cbind(a, b)) df a b 1 1 A 2 2 B 3 3 C 4 4 D 5 5 E 6 6 F 7 7 G 8 8 H 9 9 I 10 10 J library(RSQLite) drv - dbDriver(SQLite) con - dbConnect(drv, dbname = Test) dbWriteTable(con, DF, df, row.names = FALSE, overwrite = TRUE) [1] TRUE df2 - dbGetQuery(con, SELECT DISTINCT * FROM DF) dbDisconnect(con) [1] TRUE df2 a b 1 1 A\r 2 2 B\r 3 3 C\r 4 4 D\r 5 5 E\r 6 6 F\r 7 7 G\r 8 8 H\r 9 9 I\r 10 10 J\r sessionInfo() R version 2.2.1, 2005-12-20, i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base other attached packages: RSQLite DBI 0.4-1 0.1-10 Mikkel Grum Genetic Diversity International Plant Genetic Resources Institute __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] renaming dataframe1 using column names from dataframe2?
?assign, but _don't_ use it; lists are better. dfr - list() for(j in 1:9) { dfr[[as.character(j)]] - ... } Don't try to imitate the limited macro approach of other software (e.g. SAS). You can do all that in R, but it's much simpler and much safer to rely on list indexing and functions that return values (rather than create objects). -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Friday, March 17, 2006 10:26 AM To: rhelp Subject: [R] renaming dataframe1 using column names from dataframe2? I have a dataframe named temp, and another dataframe named descriptions. I wish to rename temp, and to call it the names of a certain column in the dataframe descriptions. Is there a good way to do this? A similar question: I am using a for loop to create several new dataframes. e.g. for(j in 1:9){….. I'd like each dataframe to be named d1, d2, d3, with the number being tied to the j (the iteration). Is this possible __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] create a gui with a button to change graphic?
Adapt the function below to suit your needs. If you really want to plot 5 minutes at a time, round the time series to the last MM:00 times (where MM is in 5*0:11) and have idx below loop over them. splitplot - function(x,points) { boundaries - c(1,points*1:floor(length(x)/points),length(x)) for (i in 2:length(boundaries)) { idx - boundaries[i-1]:boundaries[i] plot(idx,x[idx],type=o)#here you may prefer time.of.x[idx] to idx } } #examples par(ask=TRUE) ; splitplot(rnorm(1000),350) par(mfrow=c(3,1),ask=FALSE) ; splitplot(rnorm(1000),350) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gael de Lannoy Sent: Monday, March 20, 2006 7:40 AM To: r-help@stat.math.ethz.ch Subject: [R] create a gui with a button to change graphic? Hello everybody, I am wondering if it is possible to create a gui to plot a time series that is very big, it's an EEG signal of 20mins. What I would like to do is plot the first 5mins, then have a button on the gui that plots the next 5mins when pushed. Is it possible? Thanks in advance ! Gael. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Multivariate linear regression
Apparently you do not understand the point, and seem to (want to) see patterns all over the place. A good start for the treatment of this interesting disease is 'Fooled by Randomness' by Nassim Nicholas Taleb. The main point of the book is that many things may be a lot more random than one might care to imagine or believe. (Ramsey theory is misleading and of no help here, given its biased premise that complete disorder is impossible (T. S. Motzkin, Wikipedia).) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nagu Sent: Wednesday, April 05, 2006 8:09 PM To: Berton Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multivariate linear regression Hi Bert, Thank you for your prompt reply. I understand your point. But randomness is just a matter of scale of the object (Ramsey Theory) . The X matrix does not explain the complete variation in Y due to a large noise in X or simply the mapping f: X-Y is many valued (or due to other finite number of reasons). Theoretically inverse does not exist for many valued functions. In regression type problems, we are evaluating the pseudoinverse of data space. To estimate the inverses of many valued functions, theoretically, we may have to use branch cuts method or something called Riemann surfaces, they are partition of the domain of connected sheets. As I am not a qualified statistician or have a good experience in building statistical models for highly noisy data, I am wondering how did you deal with such situations, if any exist, in your working experience? I will try your idea of feeding some random variables as predictors in X. Thank you again, Nagu P.S. Why is that pattern recognition is all about finding patterns that can not be seen easily, huh? On 4/5/06, Berton Gunter [EMAIL PROTECTED] wrote: Ummm... If y is unrelated to x, then why would one expect any reasonable method to show a greater or lesser relationship than any other? It's all random. Of course, put enough random regressors into/tune the parameters enough of any regression methodology and you'll be able to precisely predict the data at hand -- but **only** the data at hand. I should note that such work apparently frequently appears in various sorts of informatics/data mining/omics/etc. journals these days, as various papers demonstrating the irreproducibility of numerous purported discoveries have infamously demonstrated. Let us not forget Occam! Just being cranky ... -- Bert Gunter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nagu Sent: Wednesday, April 05, 2006 3:52 PM To: r-help@stat.math.ethz.ch Subject: [R] Multivariate linear regression Hi, I am working on a multivariate linear regression of the form y = Ax. I am seeing a great dispersion of y w.r.t x. For example, the correlations between y and x are very small, even after using some typical transformations like log, power. I tried with simple linear regression, robust regression and ace and avas package in R (or splus). I didn't see an improvement in the fit and predictions over simple linear regression. (I also tried this with transformed variables) I am sure that some of you came across such data. How did you deal with it? Linear regressions are good for the data like y = x + 0.01Normal(mu,sigma2) i.e. a small noise (data observed in a lab). But linear regressions are bad for large noise, like typical market (or survey) data. Thank you, Nagu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
There are several kinds of standardization, and 'normalization' is only one of them. For some details you could check http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm (see Details for standardization methods). Standardization is required prior to clustering to control for the impact of scale. (Variables with large variances tend to have more effect on the resulting clusters than those with small variances.) I don't know how valuable standardization may be in other areas. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of roger bos Sent: Thursday, April 06, 2006 1:15 PM To: Berton Gunter; Liaw, Andy Cc: rhelp Subject: Re: [R] pros and cons of robust regression? (i.e. rlm vs lm) I'm asking this question purely for my own benefit, not to try to correct anyone. The procedure you refer to as normalization I have always heard referred to as standardization. Is the former the proper term? Also, you say its not necessary given today's hardware, but isn't it beneficial to get all the variables in a similar range? Is thre any other transformation that you would suggest? I use rlm (and normalization) in my models I use every day, so I was happy to read the above comments. Thanks, Roger On 4/6/06, Berton Gunter [EMAIL PROTECTED] wrote: Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course. -- Bert -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 9:56 AM To: 'Berton Gunter'; 'r user'; 'rhelp' Subject: RE: [R] pros and cons of robust regression? (i.e. rlm vs lm) To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] I am surprised (and a little irritated)
Installing R on SuSE 10.0 may be less than trivial for a beginner (I ended up compiling GCC plus 3-4 other things). In case you lose your patience I'd suggest trying Mepis Linux: it's very easy to install and the package management GUI (Synaptic) is great. Installing R together with a bunch of R packages, courtesy of the Debian folks, is a breeze. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Backer Johnsen Sent: Wednesday, April 19, 2006 3:05 PM To: r-help@stat.math.ethz.ch Subject: [R] I am surprised (and a little irritated) I have started with using R on Windows, and I am really happy about the system. Now, one of my other ambitions is to learn how to use Linux, so yesterday I downloaded OpenSuse and installed that. The next problem was to try to use R with Linux. And there I met the wall. I've understood that RPM's are somewhat like installing programs on Windows, so that was downloaded and started with YAST. And got some error messages about missing stuff. The first reactions is surprise -- there must be an error in the installation procedure. I have never (well, almost) met an installation procedure on Windows that did not include everything needed. And the installation of R on Windows was very smooth. Then I discover to my big surprise that the readme file says that I need to have eight installed packages. Then it says Most of them are included in a standard install. Sigh. Then the problem next is to find out which of the eight I already have and which ones I need to locate somewhere. Where can I find them I wonder. Somewhere on the net? And that is how far I got today. So, one of the complaints I have is that the instructions for installing R on Linux are very cryptic, and to a large extent assume that you already know Linux. Which I do not. And I expect instructions on installing should be simple and clear. But I am a very experienced computer user, so I really expect to be able to understand instructions. I cannot expect my students to manage what I cannot manage myself, so Linux is out, or at least Suse Linux. And that is a pity, for a number of reasons. The second is just as much surprise at the installation procedure. Under Windows there are any number of installers which make it easy for a programmer to put together all the files needed and place them in the right place. And simeone should get the OpenSuse people to include R in the installation. Tom ++ | Tom Backer Johnsen, Psychometrics Unit, Faculty of Psychology | | University of Bergen, Christies gt. 12, N-5015 Bergen, NORWAY | | Tel : +47-5558-9185Fax : +47-5558-9879 | | Email : [EMAIL PROTECTED]URL : http://www.galton.uib.no/ | ++ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Aggregating data (with more than one function)
I am looking for an answer to a similar question - a generalized solution that would be able to apply (1) any number of functions (2) to any number of vectors (3) by any number of factors (just like SQL's group by). The output data frame must contain the values of the by factors, to be used for joins. Aggregate() does (2) and (3). The solutions posted to this thread (split+sapply, by, tapply) do (1) and (3) (or so it seems to me). What would be the best way to get to (1)+(2)+(3)? I am inclined to use aggregate() in a loop with eval(parse(text=aggregate expression here)). Running groupby - do.call(rbind, by(var_i, list(a,b,c,d,e,f), function(x) c(fct1(x),fct2(x),fct3(x),fct4(x in a loop (var_1, var_2 etc) would be very nice but I don't know how to add a-f as columns in the output data frame. Thank you, b. On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman wrote: I have the data similar to the following in a data frame: LastName Department Salary 1 JohnsonIT 56000 2 James HR 54223 3 Howe Finance 8 4 Jones Finance 82000 5 NorwoodIT 67000 6 Benson Sales 76000 7 Smith Sales 65778 8 Baker HR 56778 9 DempseyHR 78999 10 Nolan Sales 45667 11 Garth Finance 89777 12 JamesonIT 56786 I want to calculate both the mean salary broken down by Department and also the total amount paid out per department i.e. I want both sum(Salary) and mean(Salary) for each Department. Right now, I am using aggregate.data.frame twice, creating two data frames, and then combining them using data.frame. However, this seems to be very memory and processor intensive and is taking a very long time on my data set. Is there a quicker way to do this? Thanks in advance, Siv Raman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html