RE: [R] if(foo == TRUE) .. etc

2005-04-22 Thread bogdan romocea
Great suggestion; it made me change all my Ts/Fs to TRUE/FALSE. Given F - TRUE T - FALSE is it possible to forbid T to stand for TRUE, and F for FALSE in function(...,something=T)? Or, alternatively, never allow F - whatever and T - whatever? I don't know what the technical side is,

RE: [R] have to point it out again: a distribution question

2005-04-29 Thread bogdan romocea
Then, Reid, or other r-gurus, is there a good way to descritize the sample into 3 category: 2 tails and the body? Out of curiosity, how do you plan to use that information? What would you do if you knew that the 'body' starts here and ends there? -Original Message- From: WeiWei Shi

RE: [R] Change the result data

2005-05-11 Thread bogdan romocea
hec.data -array(c(5,15,20,68,29,54,84,119,14,14,17,26,16,10,94,7), dim=c(4,4), dimnames=list(eye=c(Green,Hazel, Blue, Brown), hair=c(Black, Brown, Red, Blond))) #-- dfr -

RE: [R] aggregate

2005-05-11 Thread bogdan romocea
Assuming dfr[day,o,h,l,c] and day like 2004-12-28: dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0 wk - format(dt,%Yw%U) aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr$c),list(wk),mean) colnames(aggr) - etc -Original Message- From: Omar Lakkis [mailto:[EMAIL PROTECTED] Sent:

Re: [R] aggregate

2005-05-11 Thread bogdan romocea
In fact since you have dates and not datetimes use as.Date() instead of strptime(). On 5/11/05, bogdan romocea wrote: Assuming dfr[day,o,h,l,c] and day like 2004-12-28: dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0 wk - format(dt,%Yw%U) aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr

[R] get plot in a window when running R in the shell

2005-05-16 Thread bogdan romocea
Dear useRs, On a GNU/Linux box I want to run some code from the command line. This works #!/bin/sh R --vanilla -q --gui=X11 code.r however I want the plots to appear in a window (as it happens when the code is run interactively) instead of being saved in 'Rplots.ps'. Is that doable? Thank

RE: [R] standardization

2005-05-18 Thread bogdan romocea
You asked another question about clustering, so I presume you want to standardize some variables before clustering. In SAS, PROC STDIZE offers 18 standardization methods. See http://support.sas.com/91doc/getDoc/statug.hlp/stdize_sect12.htm#stat_stdize_stdizesm for details. If you're really

Re: [R] R annoyances

2005-05-20 Thread bogdan romocea
, 2005 9:39 AM To: bogdan romocea Cc: R-help@stat.math.ethz.ch Subject: RE: [R] R annoyances On Fri, 20 May 2005, bogdan romocea wrote: On 20-May-05 Uwe Ligges wrote: All possible changes to T/F (both removing the meaning of TRUE/FALSE in a clean session and making them reserved words) would

RE: [R] colors and palettes and things...

2005-05-23 Thread bogdan romocea
1. I faced the same issue and came up with the code below. 2. See rainbow(). allcol - colors() png(Rcolors.png,width=1100,height=3000) par(mai=c(0.4,0.5,0.3,0.2),omi=c(0.2,0,0,0),cex.axis=0.1,pch=15,bg=white) plot(1,1,xlim=c(1,10),ylim=c(1,66),col=allcol[1],cex=4)

RE: [R] reading multiple files

2005-05-24 Thread bogdan romocea
You're almost there, use a list: myfiles - list() for (i in 1:n) myfiles[[i]] - etc You can then get at your data frames with myfiles[[1]], myfiles[[2]]... Or, if you prefer to combine them into a single data frame (assuming they're similar), allmyfiles - do.call(rbind,myfiles) -Original

RE: [R] precision problem

2005-05-25 Thread bogdan romocea
This is a FAQ, 7.31. -Original Message- From: Omar Lakkis [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 25, 2005 10:09 AM To: r-help@stat.math.ethz.ch Subject: [R] precision problem I have prices that I am finding difficult to compare with ==, and , due to precision. For example: the

RE: [R] Rounding fractional numbers to nearest fraction

2005-05-25 Thread bogdan romocea
Multiply by 4, round and divide by 4. a - c(1.15,5.82) round(a*4,digits=0)/4 -Original Message- From: Ken Termiso [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 25, 2005 1:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Rounding fractional numbers to nearest fraction Hi all, I've got

RE: [R] Using R for classifying new samples

2005-05-27 Thread bogdan romocea
Read this book, Multivariate Statistical Analysis: A Conceptual Introduction by Sam Kash Kachigan. I think it's *great*, and perfect for someone without any statistical background. -Original Message- From: manav ram [mailto:[EMAIL PROTECTED] Sent: Friday, May 27, 2005 10:56 AM To:

RE: [R] Reading huge chunks of data from MySQL into Windows R

2005-06-06 Thread bogdan romocea
You don't say what you want to do with the data, how many columns you have etc. However, I would suggest proceeding in this order: 1. Avoid R; do everything in MySQL. 2. Use random samples. 3. If for some reason you need to process all 160 million rows in R, do it in a loop. Pull no more than,

RE: [R] Reading huge chunks of data from MySQL into Windows R

2005-06-06 Thread bogdan romocea
length(unique(userid)) will take (almost) no time... So I think the other way round will serve best: Do everything in R and avoid using SQL on the database... -Ursprüngliche Nachricht- Von: bogdan romocea [mailto:[EMAIL PROTECTED] Gesendet: Montag, 6. Juni 2005 16:27 An: Dubravko Dolic Cc

RE: [R] reading non-existing files

2005-06-07 Thread bogdan romocea
file.exists(): if(!file.exists(your.file)) next Or, try(): your.data - try(as.matrix(whatever)) if (class(your.data) == try-error) {something went wrong / the file doesn't exist - just for logging, the code will not fail} -Original Message- From: Dave Evens [mailto:[EMAIL PROTECTED]

[R] get level combinations from by list

2005-06-08 Thread bogdan romocea
Dear useRs, Given this code I end up with a list of class by: a - sample(1:5,200,replace=TRUE) b - sample(c(v1,v2,v3),200,replace=TRUE) c - sample(c(11,22,33),200,replace=TRUE) data - runif(200) grouped - by(data,list(a,b,c),function(x) {c(min=min(x),max=max(x),

Re: [R] load ing and saving R objects

2005-06-14 Thread bogdan romocea
On Tue, 14 Jun 2005, Prof Brian Ripley wrote: If your file system does not like 15000 files you can always save in a DBMS. Or, switch to a better/more appropriate file system: http://en.wikipedia.org/wiki/Comparison_of_file_systems ReiserFS would allow you to store up to about 1.2 million

Re: [R] Excel files first row not being read

2005-06-16 Thread bogdan romocea
You could use a VB macro in Excel to automate the data export in CSV format, and it's not complex at all, for example: Private Sub CommandButton1_Click() Dim strB18 As String strB18 = Me.Cells(18, 2) 'MsgBox Export Folder = strB18 On Error GoTo ErrHandler Sheets(Inputs).SaveAs FileName:= _

[R] how to make R faster under GNU/Linux

2005-06-20 Thread bogdan romocea
Dear useRs, I timed the same code (simulation with for loops) on the same box (dual Xeon EM64T, 1.5 Gb RAM) under 3 OSs and was surprised by the results: Windows XP Pro (32-bit): Time difference of 5.97 mins 64-bit GNU/Linux (Fedora Core 4): Time difference of 6.97 mins 32-bit

Re: [R] Make matrix from SQL query result

2005-06-24 Thread bogdan romocea
It may be better to do this in SQL. The code below works for an arbitrary number of IDs and handles missing values. test - data.frame(id=rep(c(1,2),10),date=sort(c(1:10,1:10)),ret=0.01*-9:10) idret - list() ids - sort(unique(test$id)) for (i in ids) { idret[[as.character(i)]] -

Re: [R] Trouble with Excel table connection

2005-06-30 Thread bogdan romocea
The best 3 things you can do in this situation are: 1. don't use Excel. 2. never use Excel. 3. never ever use Excel again. Spreadsheets are _not_ databases. In particular, Excel is a time bomb - use it long enough and you'll get burned (perhaps without even realizing it). See

Re: [R] how to call sas in R

2005-07-05 Thread bogdan romocea
Why don't you do the simulations in SAS? If you prefer otherwise, setup the SAS code for running in batch mode (output and log redirection), then call it from R with (on Windows, untested) system(start ' ' C:\etc\sas.exe -sysin garch.sas) To keep the parameters from the estimate, have the SAS job

Re: [R] Boxplot labels

2005-10-20 Thread bogdan romocea
Here's one approach. values - c(rnorm(1000,-5,1),rnorm(1000,10,0.5)) boxplot(values) text(1,0,labels=better use violin plots,col=red) #-- require(vioplot) vioplot(values) text(1,0,labels=better than box plots,col=red,pos=4) -Original Message- From: Keith Sabol [mailto:[EMAIL

Re: [R] data.frame-question

2005-10-25 Thread bogdan romocea
Welcome to R. See ?merge then ?aggregate or require(Hmisc) ?summarize or ?by You can probably find many examples in the archives, if needed. -Original Message- From: Michael Graber [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 3:45 PM To: R-Mailingliste

Re: [R] How to convert time to days

2005-10-26 Thread bogdan romocea
Those are obviously days, not seconds. A simple test would have answered your question: test - strptime(20051026 15:26:19,format=%Y%m%d %H:%M:%S) - strptime(20051024 16:23:01,format=%Y%m%d %H:%M:%S) class(test) test cat(test,\n) If you prefer you can use difftime for conversion:

Re: [R] clustering

2005-10-28 Thread bogdan romocea
Assuming you don't end up with too many clusters, you could take the classification and use it as the target for a tree, random forest, discriminant analysis or multinomial logistic regression. The random forest may be the best option. -Original Message- From: alessandro carletti

Re: [R] Visualizing a Data Distribution -- Was: breaks in hist()

2005-11-02 Thread bogdan romocea
Leaf Sun wrote: The histogram is highly screwed to the right, say, the range of the vector is [0, 2], but 95% of the value is squeezed in the interval (0.01, 0.2). I guess the histogram is as you wrote. See http://web.maths.unsw.edu.au/~tduong/seminars/intro2kde/ for a short explanation.

Re: [R] newbie graphics question: Two density plots in same frame ?

2005-11-03 Thread bogdan romocea
Here's a function that you can customize to fit your needs. lst is a named list. multicomp - function(lst) { clr - c(darkgreen,red,blue,brown,magenta) alldens - lapply(lst,function(x) {density(x,from=min(x),to=max(x))}) allx - sapply(alldens,function(d) {d$x}) ally - sapply(alldens,function(d)

Re: [R] assign() problem

2005-11-23 Thread bogdan romocea
Don't use assign(), named lists are much better (check the stuff on indexing lists). Here's an example: a - list() a[[one]] - c(1,2,3) a[[two]] - c(4,5,6) a[[two]] do.call(rbind,a) do.call(cbind,a) lapply(a,sum) With regards to your question, did you try printing varname[i] in your loop to see

Re: [R] date/time arithmetic

2005-11-30 Thread bogdan romocea
What do you need a bunch of functions for? I'm not familiar with the details of difftime objects, however an easy way out of here is to get the time difference in seconds, which you can then add or subtract as you please from date-times. x-Sys.time(); y-Sys.time()+3600 diff -

Re: [R] OT: Statistics question

2005-11-30 Thread bogdan romocea
What if the distributions are not normal etc? You might want to try a simulation to get an answer. Draw random samples from each distribution (without assuming normality etc - one way to do this is to get the quantiles, then draw a sample of quantiles, then draw a value from each quantile), throw

Re: [R] export from R to MySQL

2005-12-12 Thread bogdan romocea
Sean Davis wrote: but you will have to create the table by hand There's no need for manual steps. To take advantage of MySQL's extremely fast 'load data infile' you could dump the data in CSV format, write a script for mysql (the command line tool), for example q - function(table,infile) {

Re: [R] export from R to MySQL

2005-12-12 Thread bogdan romocea
That was just an example -- it's not difficult to write an R function to generate the mysql create table syntax for a data frame with 60 or 600 columns. (BTW, I would never type 67 columns.) On 12/12/05, Sean Davis [EMAIL PROTECTED] wrote: On 12/12/05 9:21 AM, bogdan romocea [EMAIL PROTECTED

Re: [R] Open a new script from R command prompt

2005-12-28 Thread bogdan romocea
Are you talking about Rgui on Windows? Use the shortcut, Alt-F-N. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ronnie Babigumira Sent: Wednesday, December 28, 2005 9:21 AM To: R Help Subject: [R] Open a new script from R command prompt Hi, (this is a

Re: [R] Count or summary data

2005-12-30 Thread bogdan romocea
Here's one approach, v1 - sample(c(-1,0,1),30,replace=TRUE) v2 - sample(c(0.05,0,0.1),30,replace=TRUE) lst - split(v1,v2) counted - lapply(lst,table) mat - do.call(rbind,counted) print(counted) print(mat) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf

Re: [R] Q about RSQLite

2006-01-03 Thread bogdan romocea
Check the way you imported the data / the SQLite documentation. The \r\n that you see (you're on Windows, right?) is used to indicate the end of the data lines in the source file - \r is a carriage return, and \n is a new line character. -Original Message- From: [EMAIL PROTECTED]

Re: [R] bookmarking a page inside r-project.org

2006-01-03 Thread bogdan romocea
In fact it's just as easy in Internet Explorer: right-click + Open in New Window, or Shift-Click, followed by Ctrl+D. Or, right-click + Add to Favorites. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charles Annis, P.E. Sent: Monday, January 02,

Re: [R] For loop gets exponentially slower as dataset gets larger...

2006-01-03 Thread bogdan romocea
Your 2-million loop is overkill, because apparently in the (vast) majority of cases you don't need to loop at all. You could try something like this: 1. Split the price by id, e.g. price.list - split(price,id) For each id, 2a. When price is not NA, assign it to next price _without_ using a for

Re: [R] Suggestion for big files [was: Re: A comment about R:]

2006-01-05 Thread bogdan romocea
ronggui wrote: If i am familiar with database software, using database (and R) is the best choice,but convert the file into database format is not an easy job for me. Good working knowledge of a DBMS is almost invaluable when it comes to working with very large data sets. In addition, learning

Re: [R] Wald tests and Huberized variances (was: A comment about R:)

2006-01-05 Thread bogdan romocea
Peter Muhlberger wrote: But, there is a second point here, which is how difficult it was for me [...] to find what seem to me like standard key features I've taken for granted in other packages. There is another side to this. Don't consider only how difficult it was to find what you were

[R] need palette of topographic colors similar to topo.colors()

2006-01-07 Thread bogdan romocea
Dear useRs, I got stuck trying to generate a palette of topographic colors that would satisfy these two requirements: - the pallete must be 'anchored' at 0 (just like on a map), with light blue/lawn green corresponding to data values close to 0 (dark blue to light blue for negative values,

Re: [R] matching country name tables from different sources

2006-01-10 Thread bogdan romocea
See http://en.wikipedia.org/wiki/Levenshtein_distance http://thread.gmane.org/gmane.comp.lang.r.general/31499 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Werner Wernersen Sent: Tuesday, January 10, 2006 2:00 PM To: Gabor Grothendieck Cc:

Re: [R] read.table problem

2006-01-25 Thread bogdan romocea
By the way, you might find this sed one-liner useful: sed -n '11981q;11970,11980p' filename.txt It will print the offending line and its neighbors. If you're on Windows you need to install Windows Services For Unix or Cygwin. -Original Message- From: [EMAIL PROTECTED]

Re: [R] Some questions on Rpart algorithm

2006-10-17 Thread bogdan romocea
With regards to your first question, here's a function I used a couple of times to get plots similar to those you're looking for. (Search the list for how to find the source code. Also, there's a reference other than MASS on the ?rpart page.) #bogdan romocea 2006-06 #adapted source code from

Re: [R] Automatic File Reading

2006-10-18 Thread bogdan romocea
Forget about assign() Co. Search R-help for 'assign', read the documentation on lists, and realize that it's quite a lot better to use lists for this kind of stuff. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scionforbai Sent: Wednesday, October

Re: [R] Book recommendation for newbie to stats and R?

2006-10-18 Thread bogdan romocea
I haven't seen the first book (DAAG) mentioned so far, I have it and think it's very good. Anyway, I recommend you buy all R books (and perhaps take some extra time off to study them): your employer can well afford that, given the cash you're saving by not using proprietary software.

Re: [R] memory management

2006-10-30 Thread bogdan romocea
This was asked before. Collapse the data frame into a vector, e.g. v - apply(DF,1,function(x) {paste(x,collapse=_)}) then work with the values of that vector (table, unique etc). If your data frame is really large run this in a DBMS. -Original Message- From: [EMAIL PROTECTED]

Re: [R] match lists

2006-10-30 Thread bogdan romocea
What is it that you don't know how to do? Loop over the matrices from the 2 lists and merge them two by two, for example AB - list() ; id - 1 for (i in 1:length(A)) for (j in 1:length(B)) { AB[[id]] - merge(A[[i]],B[[j]],...) id - id + 1 } To better keep track of who's who, you may want to

Re: [R] CPU or memory

2006-11-07 Thread bogdan romocea
Does any one know of comparisons of the Pentium 9x0, Pentium(r) Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 64 FX/Dual Core AM2 and similar chips when used for this kind of work. I think your best option, by far, is to answer the question on your own. Put R and your programs on

Re: [R] fit sine?

2006-12-19 Thread bogdan romocea
Read up on the discrete Fourier transform: http://en.wikipedia.org/wiki/Discrete_Fourier_transform http://en.wikipedia.org/wiki/Frequency_spectrum#Spectrum_analysis -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Randy Zelick Sent: Tuesday, December

Re: [R] Google Desktop Search and R script files

2006-12-27 Thread bogdan romocea
If you're on Windows switch to http://www.copernic.com/en/products/desktop-search/index.html , last time I looked it was quite a lot better than Google Desktop Search. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Farrel Buchinsky Sent: Wednesday,

Re: [R] loading data and executing queries with R and Mysql

2007-01-03 Thread bogdan romocea
Nevermind the CPU usage, the likely problem is that your queries are inefficient in one or more ways (i.e., you don't use indexes when you really should - it's impossible to guess without knowing how the data and the queries look like, which somehow you've decided are not important enough to

[R] export many plots to one file

2007-01-04 Thread bogdan romocea
Dear useRs, I have a few hundred plots that I'd like to export to one document. pdf() isn't an option, because the file created is prohibitively huge (due to scatter plots with many points). So I have to use png() instead, but then I end up with a lot of files (would prefer just one). 1. Is

Re: [R] Access, Process and Read Information from Web Sites

2007-01-09 Thread bogdan romocea
Not sure about R, but for a Perl example check http://yosucker.sourceforge.net/ . -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tudor Bodea Sent: Monday, January 08, 2007 11:53 AM To: r-help@stat.math.ethz.ch Cc: Tudor Bodea Subject: [R] Access,

[R] hiccup in apply?

2007-01-19 Thread bogdan romocea
Hello, I don't understand the behavior of apply() on the data frame below. test - structure(list(Date = structure(c(13361, 13361, 13361, 13361, 13361, 13361, 13361, 13361, 13362, 13362, 13362, 13362, 13362, 13362, 13362, 13362, 13363, 13363, 13363, 13363, 13363, 13363, 13363, 13363, 13364, 13364,

Re: [R] sequential processing

2007-01-22 Thread bogdan romocea
One option for processing very large files with R is split: ## split a large file into pieces #--parameters: the folder, file and number of parts FLD=/home/user/data F=very_large_file.dat parts=50 #---split cd $FLD fn=`echo $F | awk -F\. '{print $1}'` #file name without extension

Re: [R] How can I calculate conditional mean in a large dataset including date data

2007-02-01 Thread bogdan romocea
days - seq(as.Date(1970/1/1), as.Date(2003/12/31), days) temp - rnorm(length(days), mean=10, sd=8) tapply(temp, format(days,%Y-%m), mean) tapply(temp, format(days,%b), mean) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Majid Iravani Sent:

Re: [R] Fatigued R

2007-02-13 Thread bogdan romocea
The problem with your code is that it doesn't check for errors. See ?try, ?tryCatch. For example: my.download - function(forloop) { notok - vector() for (i in forloop) { cdaily - try(blpGetData(...)) if (class(cdaily) == try-error) { notok - c(notok, i) } else {

Re: [R] R and SAS proc format

2007-03-06 Thread bogdan romocea
See ?cut for continuous variables, and ?factor, ?levels for the others. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack Sent: Tuesday, March 06, 2007 12:49 PM To: R-help@stat.math.ethz.ch Subject: [R] R and SAS proc format Dear all,

Re: [R] How to create a list that grows automatically

2007-03-09 Thread bogdan romocea
This is a bad idea as it can greatly slow things down (the details were discussed several times on this list). What you want to do is define from the start the length of your vector/list, then grow it (by a large margin) only if it becomes full. lst - vector(mode=list, length=10) #assuming

Re: [R] Reasons to Use R

2007-04-06 Thread bogdan romocea
(1)Institutions (not only academia) using R http://www.r-project.org/useR-2006/participants.html (2)Hardware requirements, possibly benchmarks Since you mention huge data sets, GNU/Linux running on 64-bit machines with as much RAM as your budget allows. (3)R clusters, R multiple CPU

Re: [R] upgrade to 2.5

2007-05-03 Thread bogdan romocea
I find it easier to install all the packages again: #---run in previous version packages - installed.packages()[,Package] save(packages, file=Rpackages) #---run in new version load(Rpackages) for (p in setdiff(packages, installed.packages()[,Package])) install.packages(p) -Original

Re: [R] RMySQL question, sql with R vector or list

2007-06-05 Thread bogdan romocea
With regards to your concern - export the R object to a MySQL table (the RMySQL documentation tells you how), then run an inner join. Or if the table to query isn't that big, pull it in R and subset it with %in%. You could use system.time() to see which runs faster. -Original Message-

Re: [R] Speed up R

2007-06-21 Thread bogdan romocea
Don't rush to buy new hardware yet (other than perhaps more RAM for your existing desktop). First of all you should make sure that your R code can't be made any faster. (I've seen cases where careful re-writes increased speed by a factor of 10 or more.) There are some rules (such as pre-allocate

Re: [R] summing columns of data frame by group

2007-08-21 Thread bogdan romocea
Here's one way, lapply(split(DF, your.vector), function(x) {apply(x, 2, sum)}) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Daniel O'Shea Sent: Tuesday, August 21, 2007 3:53 PM To: r-help@stat.math.ethz.ch Subject: [R] summing columns of data

Re: [R] Excel

2007-08-28 Thread bogdan romocea
On a related note, there's one other amazingly stupid thing that Excel (2002 SP3) does - it exports to CSV the numbers as you see them displayed, and not as they were entered/imported in the first place. For example, 1.2345678 will be exported to CSV/tab delimited as 1.23 if that column is

<    1   2