Re: [R] read.xport

2005-07-14 Thread bogdan romocea
How about avoiding SAS XPORT altogether and exporting everything in the simple, clean, non-proprietary, extremely reliable, platform-independent ... etc text format (CSV, tab delimited etc)? -Original Message- From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED] Sent: Thursday, July

Re: [R] Is it possible to create highly customized report in *.xls format by using R/S+?

2005-07-21 Thread bogdan romocea
So your conclusion is that the only choice is to make mistakes and get in trouble. (That's what Excel excels at.) Two options I haven't seen mentioned are: 1. Create your deliverables in HTML format, and change the extension from .htm to .xls; Excel will import them automatically. The way the

Re: [R] Rprof fails in combination with RMySQL

2005-07-21 Thread bogdan romocea
I think you're barking up the wrong tree. Optimize the MySQL code separately from optimizing the R code. A very nice reference about the former is http://highperformancemysql.com/. Also, if possible, do everything in MySQL. hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL

Re: [R] Rprof fails in combination with RMySQL

2005-07-22 Thread bogdan romocea
never close the connection after a query.) hth, b. -Original Message- From: Thieme, Lutz [mailto:[EMAIL PROTECTED] Sent: Friday, July 22, 2005 2:04 AM To: bogdan romocea Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Rprof fails in combination with RMySQL Hello Bogdan

Re: [R] choose between dates and times

2005-07-26 Thread bogdan romocea
If happenat is not a datetime value, convert it with strptime(). Then, one solution is to transform it in the following way: num.time - as.numeric(format(happenat,%Y%m%d%H%M%S)) This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset your data frame with dfr[which(num.time =

Re: [R] How to hiding code for a package

2005-08-01 Thread bogdan romocea
There's something else you could try - since you can't hide the code, obfuscate it. Hide the real thing in a large pile of useless, complicated, awfully formatted code that would stop anyone except the most desperate (including yourself, after a couple of weeks/months) from trying to understand

Re: [R] date format

2005-08-10 Thread bogdan romocea
You need the day to convert to a date format. Assuming day=15: x.date - as.Date(paste(as.character(x),-15,sep=),format=%Y-%m-%d) -Original Message- From: alessandro carletti [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 10, 2005 9:37 AM To: rHELP Subject: [R] date format

Re: [R] Concerning reading of SAS-files

2005-08-12 Thread bogdan romocea
The first one is an index, not a data set. Anyway, just use SAS to export the data sets in text format (CSV, tab-delimited etc). You can then easily read those in R. (By the way, the help for read.xport says that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT file? Hint: no.)

Re: [R] retrieving large columns using RODBC

2005-08-15 Thread bogdan romocea
This appears to be an SQL issue. Look for a way to speed up your queries in Postgresql. I presume you haven't created an index on 'index', which means that every time you run your SELECT, Postgresql is forced to do a full table scan (not good). If the index doesn't solve the problem, look for some

Re: [R] Regular expressions sub

2005-08-18 Thread bogdan romocea
One solution is test - c(1.11,10.11,11.11,113.31,114.2,114.3) id - unlist(lapply(strsplit(test,[.]),function(x) {x[2]})) -Original Message- From: Bernd Weiss [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 12:10 PM To: r-help@stat.math.ethz.ch Subject: [R] Regular

Re: [R] Linux Standalone Server Suggestions for R

2005-09-01 Thread bogdan romocea
Most powerful in what way? Quite a lot depends on the jobs you're going to run. - To run CPU-bound jobs, more CPUs is better. (Even though R doesn't do threading, you can manually split some CPU-bound jobs in several parts and run them simultaneously.) Apart from multiple CPUs and

[R] RMySQL installation problem on FC4 x86_64

2005-09-07 Thread bogdan romocea
Dear useRs, I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0 and MySQL 4.1.11-2 installed through yum). After an initial configuration error (could not find the MySQL installation include and/or library directories) I managed to install RMySQL with # export

Re: [R] boxplot statistics

2005-10-06 Thread bogdan romocea
A related comment - don't rely (too much) on boxplots. They show only a few things, which may be limiting in many cases and completely misleading in others. Here are a couple of suggestions for plots which you may find more useful than the standard box plots: - figure 3.27 from

[R] add leading 0s to %d from png() {was Automatic creation of file names}

2005-10-08 Thread bogdan romocea
Dear useRs, Is there a way to 'properly' format %d when plotting more than one page on png()? 'Properly' means to me with leading 0s, so that the PNGs become easy to navigate in a file/image browser. Lacking a better solution I ended up using the code below, but would much prefer something like

[R] decreasing performance of for() loop

2005-10-10 Thread bogdan romocea
Dear useRs, I'm wondering why the for() loop below runs slower as it progresses. On a Win XP box, the iterations at the beginning run much faster than those at the end: 1%, iteration 2000, 10:10:16 2%, iteration 4000, 10:10:17 3%, iteration 6000, 10:10:17 98%, iteration 196000, 10:24:04 99%,

Re: [R] decreasing performance of for() loop

2005-10-10 Thread bogdan romocea
Nevermind, I found the fix. Declaring the length for out eliminates the performance decrease, out - vector(mode=numeric,length=length(test)) On 10/10/05, bogdan romocea [EMAIL PROTECTED] wrote: Dear useRs, I'm wondering why the for() loop below runs slower as it progresses. On a Win XP

Re: [R] adding 1 month to a date

2005-10-12 Thread bogdan romocea
Simple addition and subtraction works as well: as.Date(1995/12/01,format=%Y/%m/%d) + 30 If you have datetime values you can use strptime(1995-12-01 08:00:00,format=%Y-%m-%d %H:%M:%S) + 30*24*3600 where 30*24*3600 = 30 days expressed in seconds. -Original Message- From: Marc

Re: [R] how to use large data set ?

2006-07-20 Thread bogdan romocea
By far, the cheapest and easiest solution (and the very first to try) is to add more memory. The cost depends on what kind you need, but here's for example 2 GB you can buy for only $150: http://www.newegg.com/Product/Product.asp?Item=N82E16820144157 Project constraints?! If they don't want to

[R] scatter plot with axes drawn on the same scale

2006-07-28 Thread bogdan romocea
Dear useRs, I'd like to produce some scatter plots where N units on the X axis are equal to N units on the Y axis (as measured with a ruler, on screen or paper). This approach x - sample(10:200,40) ; y - sample(20:100,40) windows(width=max(x),height=max(y)) plot(x,y) is better than

Re: [R] prefixing list names in print

2006-08-08 Thread bogdan romocea
A simple function will do what you want, customize this as needed: lprint - function(lst,prefix) { for (i in 1:length(lst)) { cat(paste(prefix,$,names(lst)[i],sep=),\n) print(lst[[i]]) cat(\n) } } P - list(A=a,B=b) lprint(P,Prefix) -Original Message- From: [EMAIL PROTECTED]

Re: [R] screen resolution effects on graphics

2006-08-28 Thread bogdan romocea
You forgot to mention your OS. This was asked before and if I recall correctly the answer for Windows was no. An acceptable solution (imho) is to edit the Rprofile.site files and add something like pngplotwidth - 990 ; pngplotheight - 700 pdfplotwidth - 14 ; pdfplotheight - 10 Then, use these

Re: [R] Alternatives to merge for large data sets?

2006-09-07 Thread bogdan romocea
One obvious alternative is an SQL join, which you could do directly in a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating indexes on user/userid before the join may save a lot of time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf

[R] unexpected behavior of boxplot(x, notch=TRUE, log=y)

2006-10-05 Thread bogdan romocea
A function I've been using for a while returned a surprising [to me, given the data] error recently: Error in plot.window(xlim, ylim, log, asp, ...) : Logarithmic axis must have positive limits After some digging I realized what was going on: x - c(10460.97, 10808.67, 29499.98, 1,

[R] read 4-jan-02 as date

2004-10-11 Thread bogdan romocea
Dear R users, I have a column with dates (character) in a data frame: 12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01 and I need to convert them to (Julian) dates so that I can sort the whole data frame by date. I thought it would be very simple, but after checking the documentation

Re: [R] read 4-jan-02 as date

2004-10-12 Thread bogdan romocea
Thank you everyone. Indeed, I had read the data via read.csv and the date column was a factor. Everything works fine if I convert to character first. Regards, b. --- Sundar Dorai-Raj [EMAIL PROTECTED] wrote: bogdan romocea wrote: Dear R users, I have a column with dates

[R] incomplete function output

2004-10-13 Thread bogdan romocea
Dear R users, I have a function (below) which encompasses several tests. However, when I run it, only the output of the last test is displayed. How can I ensure that the function root(var) will run and display the output from all tests, and not just the last one? Thank you, b. root -

[R] output processing / ARMA order identification

2004-10-25 Thread bogdan romocea
Dear R users, I need to fit an ARMA model. As far as I've seen, EACF (extended ACF) is not available in R. 1. Let's say I fit a series of ARMA models in a loop. Given the code/output included below, how do I pull 'Model' and 'Fit' (AIC) from each summary() so that I can combine them into an

[R] plot time series / dates (basic)

2004-11-01 Thread bogdan romocea
Dear R users, I'm having a hard time with some very simple things. I have a time series where the dates are in the format 7-Oct-04. I imported the file with read.csv so the date column is a factor. The series is rather long and I want to plot it piece by piece. The function below works fine,

Re: [R] plot time series / dates (basic)

2004-11-02 Thread bogdan romocea
=deparse(substitute(varb)), type=o) } } --- Prof Brian Ripley [EMAIL PROTECTED] wrote: On Mon, 1 Nov 2004, bogdan romocea wrote: Dear R users, I'm having a hard time with some very simple things. I have a time series where the dates are in the format 7-Oct

[R] misleading output after ordering data frame

2004-11-08 Thread bogdan romocea
Dear R users, I have a data frame which I create with read.csv and then order by date: d - na.omit(read.csv(...)) d - d[order(as.Date(as.character(d$Date), format=%d-%b-%y), decreasing=F, na.last=F),] My problem is that even though the data frame is ordered as requested, the old row

RE: [R] an off-topic question - model validation

2004-11-12 Thread bogdan romocea
Assuming you have enough data, usually 1/4 to 1/2 is used for validation. One reference would be Picard, R.R. and Berk, K.N. (1990) Data Splitting, The American Statistician, 44;140-147. hth, b. -Original Message- From: Wensui Liu [mailto:[EMAIL PROTECTED] Sent: Thursday, November 11,

[R] density estimation: compute sum(value * probability) for given distribution

2004-11-12 Thread bogdan romocea
Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu.Max. 459.9 802.3 991.6 1066.0 1242.0 2382.0 I need to compute the sum of the values times their probability of occurence. The graph

RE: [R] density estimation: compute sum(value * probability) for given distribution

2004-11-13 Thread bogdan romocea
. Could you tell us exactly what you are trying to compute, or why you're computing it? HTH, Andy From: bogdan romocea Dear R users, This is a KDE beginner's question. I have this distribution: length(cap) [1] 200 summary(cap) Min. 1st Qu. MedianMean 3rd Qu

Re: [R] Running R from CD?

2004-11-21 Thread bogdan romocea
Better install and run R from a USB flash drive. This will save you the trouble of re-writing the CD as you upgrade and install new packages. Also, you can simply copy the R installation on your work computer (no install rights needed); R will run. HTH, b. From: Hans van Walen

RE: [R] SAS or R software

2004-11-24 Thread bogdan romocea
neela v writes: Hi all there Can some one clarify me on this issue, features wise which is better R or SAS, leaving the commerical aspect associated with it. I suppose there are few people who have worked on both R and SAS and wish they would be able to help me in deciding on this. THank

RE: [R] [BASIC] Solution of creating a sequence of object names

2004-11-29 Thread bogdan romocea
You may be missing something. After you create all those objects, you'll want to use them. Use get(): for (i in 1:10) ... get(paste(object,i,sep=)) ... It took me about a week to find out how to do this. I waited for a few days, but before I got to ask this basic/rtfm question, someone else -

RE: [R] Protocol for answering basic questions

2004-12-01 Thread bogdan romocea
I'm also an R beginner. I have asked stupid questions, and received RTFM replies. I believe such replies are _GREAT_, as long as they include a brief reference to what to read, and where. (In some cases searches don't work unless you happen to use the 'right' keywords, and in other cases it may be

RE: [R] finding the most frequent row

2004-12-10 Thread bogdan romocea
Here's something that works. I'm sure there are better solutions (in particular the paste part - I couldn't figure out how to avoid typing a[i,1], ..., a[i,10]). a - matrix(nrow=1000,ncol=10) for (i in 1:1000) for (j in 1:10) a[i,j] - sample(1:0,1) b -

[R] errors when trying to rename data frame columns

2004-12-12 Thread bogdan romocea
Dear R users, I need to rename the columns of a series of data frames. The names of the data frames and those of the columns need to be pulled from some vectors. I tried a couple of things but only got errors. What am I missing? #---create data frame dframes - c(a,b,c)

RE: [R] switching to Linux, suggestions?

2004-12-13 Thread bogdan romocea
Before choosing a GNU/Linux distribution look into the package management issue. http://distrowatch.com/ I would suggest that you avoid all RPM-based distributions (Mandrake, Fedora, SuSE), and consider Debian (+ those based on it) the source-based distributions (such as Gentoo). I've been using

RE: [R] Moving standard deviation?

2004-12-13 Thread bogdan romocea
A simple for loop does the job. Why not write your own function? movsd - function(series,lag) { movingsd - vector(mode=numeric) for (i in lag:length(series)) { movingsd[i] - sd(series[(i-lag+1):i]) } assign(movingsd,movingsd,.GlobalEnv) } This is very efficient: it takes

RE: [R] sort() leaves row names unaffected

2004-12-13 Thread bogdan romocea
I asked the same question a few weeks ago. See http://tolstoy.newcastle.edu.au/R/help/04/11/6775.html -Original Message- From: Martin Wegmann Sent: Tuesday, December 14, 2004 6:23 AM To: [EMAIL PROTECTED] Subject: [R] sort() leaves row names unaffected Hello, I wonder if I ran into a

RE: [R] Re : Save result in a For Loop

2004-12-14 Thread bogdan romocea
Not sure if it's the best way, but you could do it this way: all.results - vector(mode=numeric) for (i in 1:100) { ... this.run - ... all.results - c(all.results,this.run) } At this point all.results contains the values of this.run from the whole loop. If

[R] faster row by row data frame processing

2004-12-20 Thread bogdan romocea
Dear R users, I have a data frame with a few thousand rows and several hundred numeric columns (plus a date column). For each row (day), I want to assign +/- 1 to the highest X absolute values, 0 to the other values, and save all that in a separate data frame. I have a working solution (below),

RE: [R] scheduling R tasks under windows

2004-12-21 Thread bogdan romocea
Save the command(s) in a batch (.bat) file, and then run the .bat file from the task scheduler. -Original Message- From: Mikkel Grum [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 21, 2004 3:18 PM To: RHelp Subject: [R] scheduling R tasks under windows I'm trying to schedule R tasks

RE: [R] how to fit in R

2004-12-22 Thread bogdan romocea
See http://www.statsoft.com/textbook/stdisfit.html There are several approaches you can use - Chi-square, Q-Q plots, P-P plots, various tests (Kolmogorov-Smirnov, Shapiro-Wilks' W) etc. HTH, b. -Original Message- From: Angela Re Sent: Wednesday, December 22, 2004 9:13 AM To: [EMAIL

[R] combination of scatterplot and image graph

2004-12-22 Thread bogdan romocea
Dear R users, I'm interested in a combination of a scatterplot and an image graph. I have two large vectors. Because in the scatterplot some areas are sparsely and others densely populated, I want to see the points, and I also want their color to be changed based on their density (similar to a

[R] coplot with png: disappearing grid lines

2004-12-29 Thread bogdan romocea
Dear useRs, When I use coplot() and output to png/jpeg/bmp, the grid lines from the scatter plots disappear. If I output to pdf() the grid lines are there, however I can't use it - I have many points, and the resulting PDF file is large and very slow to open and scroll through. (By the way, if I

RE: [R] Tuning string matching

2005-01-05 Thread bogdan romocea
This is a rather complex problem. I'm not aware of an R function / package that can do something like this, but in case you need to build it from scratch read http://support.sas.com/documentation/periodicals/obs/obswww15/index.html If you're familiar with SAS you could translate the code to R.

[R] global objects not overwritten within function

2005-01-11 Thread bogdan romocea
Dear useRs, I have a function that creates several global objects with assign(obj,obj,.GlobalEnv), and which I need to run iteratively in another function. The code is similar to f - function(...) { assign(obj,obj,.GlobalEnv) } fct - function(...) { for (i in 1:1000) { ...

Re: [R] global objects not overwritten within function

2005-01-12 Thread bogdan romocea
Apparently the message below wasn't posted on R-help, so I'm sending it again. Sorry if you received it twice. --- bogdan romocea [EMAIL PROTECTED] wrote: Date: Tue, 11 Jan 2005 17:31:42 -0800 (PST) From: bogdan romocea [EMAIL PROTECTED] Subject: Re: [R] global objects not overwritten within

RE: [R] help wanted using R in a classroom

2005-01-18 Thread bogdan romocea
It appears you wouldn't get much improvement at all even if the 2nd CPU were used at 100%. Five R sessions can easily overwhelm one CPU. I think you need (a lot) more CPUs than 2 to solve your problem. Possible solutions: 1. Install R on each eMac. Since you have 40 of them, you might want to put

RE: [R] animation without intermediate files?

2005-01-26 Thread bogdan romocea
Here's a different suggestion. Create a bunch of image files, and then use an image browser (GQview is one of the best; if you're on Win look at ACDSee) to view them as a slide show. Good image browsers read images in advance and should not produce flickering. I haven't experimented though with

[R] have R informed of MySQL table updates

2005-02-08 Thread bogdan romocea
Dear useRs, I have a script (Python) that every once in a while appends data to a MySQL table. Meanwhile, I have a running R session, and I want it to be aware of such table updates. I could write a loop in R to periodically check whether new data has become available; however, are you aware of a

[R] question about sorting POSIXt vector

2005-02-09 Thread bogdan romocea
Dear useRs, How come the first attempt to sort a POSIXt vector fails (Error: non-atomic type in greater), while the second succeeds? (Code inserted below.) The documentation says that POSIXt is used to allow operations such as subtraction, so I'd expect sorting to work. Is this perhaps an OS

[R] download files through secure http (HTTPS)

2005-02-27 Thread bogdan romocea
Dear useRs, I'm trying to download some data through the HTTPS protocol. However, download.file() does not support HTTPS (R 2.0.1 on WinXP): Error in download.file(https.url, destfile = test.txt) : unsupported URL scheme 1. Is there any other function/package in R that can work with

[R] draw random samples from empirical distribution

2005-02-28 Thread bogdan romocea
Dear useRs, I have an empirical distribution (not normal etc) and I want to draw random samples from it. One solution I can think of is to compute let's say 100 quantiles, then use runif() to draw a random number Q between 1 and 100, and finally run runif() again to pull a random value from the

RE: [R] Temporal Analysis of variable x; How to select the outlier threshold in R?

2005-03-01 Thread bogdan romocea
I'm not sure I understand. You have financial data and want to throw away some outliers?? Why would you ever do this? First of all, I'd suggest you pay close attention to what the data is trying to say. Maybe your distribution is not normal after all (see tests for normality etc). Maybe you

[R] XML to data frame or list

2005-03-10 Thread bogdan romocea
Dear useRs, I have a simple/RTFM question about XML parsing. Given an XML file, such as (fragment) A100/A B23/B Ctrue/C how do I import it in a data frame or list, so that the values (100, 23, true) can be accessed through the names A, B and C? I installed the XML package and looked over the

Re: [R] XML to data frame or list

2005-03-13 Thread bogdan romocea
I managed to parse more complex XML files as well. The trick was to manually determine the position of the child nodes of interest, after which they can be parsed in a loop. For example: require(XML) doc - xmlTreeParse(file.xml,getDTD=T,addAttributeNamespaces=T) r - xmlRoot(doc) #find the nodes

RE: [R] Mandrake 10.1

2005-03-15 Thread bogdan romocea
I managed to install R 2.0.1 on Mandrake 10.1 a couple of weeks ago. It wasn't that easy, first I had to manually track, download and install 3-4 dependencies. I would suggest that you consider another GNU/Linux distribution, Mepis. Mepis combines the best features of several distributions:

RE: [R] Mandrake 10.1

2005-03-16 Thread bogdan romocea
--- Rau, Roland [EMAIL PROTECTED] wrote: -Original Message- From: r-help On Behalf Of bogdan romocea Sent: Tuesday, March 15, 2005 2:49 PM I would suggest that you consider another GNU/Linux distribution, I don't think it is necessary. Mandrake 10.1 is fine for running R

RE: [R] Basic questions about RMySQL

2005-03-18 Thread bogdan romocea
1. No way. You must have MySQL installed on your computer. In fact this is not true. You can use a MySQL server installed somewhere else on the network. --- bogdan romocea [EMAIL PROTECTED] wrote: 1. No way. You must have MySQL installed on your computer. 2. You must install the server

Re: [R] Basic questions about RMySQL

2005-03-18 Thread bogdan romocea
(max.con = 16, fetch.default.rec = 5000, force.reload = F) drv - dbDriver(MySQL) con - dbConnect(drv,username=userid,password=pswd,dbname=db) dbListTables(con) --- Uwe Ligges [EMAIL PROTECTED] wrote: bogdan romocea wrote: 1. No way. You must have MySQL installed on your computer. 2. You

RE: [R] Graphics (for goodness of fit) Question

2005-03-21 Thread bogdan romocea
In regards to your plot question, you could use points() or lines(): a - sample(1:50,10) b - sample(20:40,10) plot(1:10,a,pch=20,col=red) points(1:10,b,pch=20,col=blue) #or #lines(1:10,b,pch=20,col=blue,type=o) -Original Message- From: Mohammad Ehsanul Karim [mailto:[EMAIL PROTECTED]

RE: [R] Gmail invitation

2005-03-25 Thread bogdan romocea
You can also buy these things on Ebay. I noticed the supply about 2 months ago when I guess you would have made about $1-2 per invitation. The profit opportunity is much diminished now that the supply has greatly increased (it appears every gmail account was allocated 50 invitations instead of 5 a

[R] how to simulate a time series

2005-03-31 Thread bogdan romocea
Dear useRs, I want to simulate a time series (stationary; the distribution of values is skewed to the right; quite a few ARMA absolute standardized residuals above 2 - about 8% of them). Is this the right way to do it? # load(rdtb)#the time series

RE: [R] a R function for sort a data frame.

2005-04-01 Thread bogdan romocea
dfr - data.frame(sample(1:50,10),sample(1:50,10)) colnames(dfr) - c(a,b) dfr - dfr[order(dfr$a),] dfr - dfr[order(-dfr$a),] -Original Message- From: Mario Morales [mailto:[EMAIL PROTECTED] Sent: Thursday, March 31, 2005 10:23 PM To: r-help@stat.math.ethz.ch Subject: [R] a R function for

RE: [R] Amount of memory under different OS

2005-04-04 Thread bogdan romocea
You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of RAM, the seller will probably warn you about Windows and recommend a suitable OS. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL

[R] looking for a plot function

2005-04-06 Thread bogdan romocea
Dear useRs, I have a data frame and I want to plot all rows. Each row is represented as a line that links the values in each column. The plot looks like this: dfr - data.frame(A=sample(1:50,10),B=sample(1:50,10), C=sample(1:50,10),D=sample(1:50,10)) xa - 10*1:4 plot(c(10,40),c(0,50)) for

Re: [R] Considering port of SAS application to R

2006-04-21 Thread bogdan romocea
Forget about R for now and port the application to MySQL/PostgreSQL etc, it is possible and worthwhile. In case you happen to use (and really need) some SAS DATA STEP looping features you might be forced to look into SQL cursors, otherwise the port should be (very) straightforward.

Re: [R] Need R code

2006-04-21 Thread bogdan romocea
Here's an example. lst - list() for (i in 1:5) { lst[[i]] - data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE)) colnames(lst[[i]])[2] - paste(x,i,sep=) } dfr - lst[[1]] for (i in 2:length(lst)) dfr - merge(dfr,lst[[i]],all=TRUE) dfr - dfr[order(dfr[,1]),] print(dfr)

Re: [R] regression modeling

2006-04-25 Thread bogdan romocea
There is an aspect, worthy of careful consideration, you don't seem to be aware of. I'll ask the question for you: How does the explanatory/predictive potential of a dataset vary as the dataset gets larger and larger? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL

Re: [R] www.r-project.org

2006-04-25 Thread bogdan romocea
I agree it would be worthwhile to make some cosmetic changes to r-project.org (nothing fancy though - no javascript, Flash etc). The general public may not be fully aware of how R compares to other statistical software, and I doubt that a web site which looks like it was put together 10 years ago

Re: [R] efficiency in merging two data frames

2006-05-01 Thread bogdan romocea
Another good option is SQL, the fastest and most scalable solution. If you decide to give it a try pay close attention to indexes. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller Sent: Monday, May 01, 2006 8:55 AM To: 'Guojun Zhu';

Re: [R] Axis labels

2006-05-02 Thread bogdan romocea
plot(1:10,axes=FALSE) axis(1,at=1:10,labels=10:1) axis(2,at=1:10,labels=5*10:1) box() -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christopher Brown Sent: Tuesday, May 02, 2006 12:13 PM To: r-help@stat.math.ethz.ch Subject: [R] Axis labels I

Re: [R] Listing Variables

2006-05-03 Thread bogdan romocea
Here's an example. dfr - data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50) vars - colnames(dfr) for (v in vars[grep(B,vars)]) print(mean(dfr[,v])) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Farrel Buchinsky Sent: Wednesday, May 03, 2006 10:46 AM

Re: [R] SQL like manipulations on data frames

2006-05-05 Thread bogdan romocea
This goes the other way - all SQL manipulations are a subset of what can be done with R. Read up on indexing and see ?merge, ?aggregate, ?by, ?tapply, among others. (For the R equivalent to your query, check ?grep and ?order, and search the list if needed.) Also, this example might be a good

Re: [R] Using DBI and RMySQL

2006-05-11 Thread bogdan romocea
I'll see if I can reproduce the steps under Knoppix[1]. Then you can run Knoppix with a Persistent Disk Image (PDI)[2] that contains R, the DBI, and RMySQL on just about any machine that runs Knoppix. Don't bother, it's been done already. See http://dirk.eddelbuettel.com/quantian.html

Re: [R] Fast update of a lot of records in a database?

2006-05-19 Thread bogdan romocea
Your approach seems very inefficient - it looks like you're executing thousands of update statements. Try something like this instead: #---build a table 'updates' (id and value) ... #---do all updates via a single left join UPDATE bigtable a LEFT JOIN updates b ON a.id = b.id SET a.col1 = b.value;

Re: [R] win2k memory problem with merge()'ing repeatedly (long email)

2006-05-22 Thread bogdan romocea
Repeated merge()-ing does not always increase the space requirements linearly. Keep in mind that a join between two tables where the same value appears M and N times will produce M*N rows for that particular value. My guess is that the number of rows in atot explodes because you have some

Re: [R] Manipulating code?

2006-05-23 Thread bogdan romocea
Macro stuff à la SAS is something that should be avoided whenever possible - it's messy, limited, and limiting. (I've done it ocasionally and it works, but I think it's best not to go there.) Read the documentation on lists (in particular named lists), and keep everything in one or more lists. For

Re: [R] progressive slowdown during script execution?

2006-06-01 Thread bogdan romocea
Compare system.time({ v - vector() for (i in 1:10^5) v - c(v,1) }) with system.time({ v - vector(length=10^5) for (i in 1:10^5) v[i] - 1 }) If you don't know exactly how long v will be, use a value that's large enough, then throw away what's extra. -Original Message-

Re: [R] R usage for log analysis

2006-06-12 Thread bogdan romocea
I wouldn't use a DBMS at all -- it is not necessary and I don't see what you would get in return. Instead I would split very large log files into a number of pieces so that each piece fits in memory (see below for an example), then process them in a loop. See the list and the documentation if you

Re: [R] bubbleplot for matrix

2006-06-14 Thread bogdan romocea
Here's an example. By the way, I find that it's more convenient (where applicable) to keep the data in 3 vectors/factors rather than one matrix/data frame. a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) x - y - z - vector() for (i in 1:nrow(a)) { x -

Re: [R] bubbleplot for matrix

2006-06-15 Thread bogdan romocea
-14 at 16:47 -0400, bogdan romocea wrote: Here's an example. By the way, I find that it's more convenient (where applicable) to keep the data in 3 vectors/factors rather than one matrix/data frame. a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10)) x - y - z

Re: [R] modeling logit(y/n) using lrm

2006-06-16 Thread bogdan romocea
Not sure about your data set, but if you have some kind of (weighted/stratified) sample of hospitals you need to pay special attention. Survey data violates the assumptions of the classical linear models (infinite population, identically distributed errors etc) and needs to be analyzed

Re: [R] print color

2006-07-10 Thread bogdan romocea
One option is library(R2HTML) ?HTML.cormat The thing you're after is traffic highlighting (via CSS or HTML tags). If HTML.cormat() doesn't do exactly what you want, modify the source code. (By the way, I haven't used R2HTML so far so maybe there's a more appropriate function.) -Original

Re: [R] Is it possible to only read a subset by read.table ?

2006-07-12 Thread bogdan romocea
It's possible and straightforward (just don't use R). IMHO the GNU Core Utilities http://www.gnu.org/software/coreutils/ plus a few other tools such as sed, awk, grep etc are much more appropriate than R for processing massive text files. (Get a good book about UNIX shell scripting. On Windows you

Re: [R] 15-min mean values

2006-02-02 Thread bogdan romocea
Here's another approach which can be easily implemented in SQL. 1. Start with the dates as character vectors, dt - as.character(Sys.time()) 2. Extract the minutes and round them to 0,15,30,45: minutes - floor(as.numeric(substr(dt,15,16))/15)*15 final.mins - as.character(minutes)

Re: [R] matching tables

2006-02-07 Thread bogdan romocea
t1 - as.data.frame(table(1:10)) ; colnames(t1)[2] - A t2 - as.data.frame(table(5:20)) ; colnames(t2)[2] - B t3 - merge(t1,t2,all=TRUE) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante Sent: Tuesday, February 07, 2006 4:22 PM To:

Re: [R] dataframe subset

2006-02-08 Thread bogdan romocea
Here's one way, x - data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10)) y - data.frame(V=c(2,9,10)) xy - merge(x,y,all=FALSE) Pay close attention to what happens if you have duplicate values in y, say y - data.frame(V=c(2,9,10,10)) -Original Message- From: [EMAIL PROTECTED]

Re: [R] Interleaving elements of two vectors?

2006-03-07 Thread bogdan romocea
For a general solution without warnings try interleave - function(v1,v2) { ord1 - 2*(1:length(v1))-1 ord2 - 2*(1:length(v2)) c(v1,v2)[order(c(ord1,ord2))] } interleave(rep(1,5),rep(3,8)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor

Re: [R] \r with RSQLite

2006-03-15 Thread bogdan romocea
\r is a carriage return character which some editors may use as a line terminator when writing files. My guess is that RSQLite writes your data frame to a temp file using \r as a line terminator and then runs a script to have SQLite import the data (together with \r - this would be the problem),

Re: [R] renaming dataframe1 using column names from dataframe2?

2006-03-17 Thread bogdan romocea
?assign, but _don't_ use it; lists are better. dfr - list() for(j in 1:9) { dfr[[as.character(j)]] - ... } Don't try to imitate the limited macro approach of other software (e.g. SAS). You can do all that in R, but it's much simpler and much safer to rely on list indexing and functions that

Re: [R] create a gui with a button to change graphic?

2006-03-20 Thread bogdan romocea
Adapt the function below to suit your needs. If you really want to plot 5 minutes at a time, round the time series to the last MM:00 times (where MM is in 5*0:11) and have idx below loop over them. splitplot - function(x,points) { boundaries - c(1,points*1:floor(length(x)/points),length(x)) for

Re: [R] Multivariate linear regression

2006-04-06 Thread bogdan romocea
Apparently you do not understand the point, and seem to (want to) see patterns all over the place. A good start for the treatment of this interesting disease is 'Fooled by Randomness' by Nassim Nicholas Taleb. The main point of the book is that many things may be a lot more random than one might

Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)

2006-04-06 Thread bogdan romocea
There are several kinds of standardization, and 'normalization' is only one of them. For some details you could check http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm (see Details for standardization methods). Standardization is required prior to clustering to control for the

Re: [R] I am surprised (and a little irritated)

2006-04-19 Thread bogdan romocea
Installing R on SuSE 10.0 may be less than trivial for a beginner (I ended up compiling GCC plus 3-4 other things). In case you lose your patience I'd suggest trying Mepis Linux: it's very easy to install and the package management GUI (Synaptic) is great. Installing R together with a bunch of R

RE: [R] Aggregating data (with more than one function)

2005-04-21 Thread bogdan romocea
I am looking for an answer to a similar question - a generalized solution that would be able to apply (1) any number of functions (2) to any number of vectors (3) by any number of factors (just like SQL's group by). The output data frame must contain the values of the by factors, to be

  1   2   >