Great suggestion; it made me change all my Ts/Fs to TRUE/FALSE.
Given
F - TRUE
T - FALSE
is it possible to forbid T to stand for TRUE, and F for FALSE in
function(...,something=T)?
Or, alternatively, never allow F - whatever and T - whatever?
I don't know what the technical side is,
Then, Reid, or other r-gurus, is there a good way to descritize
the sample into 3 category: 2 tails and the body?
Out of curiosity, how do you plan to use that information? What would
you do if you knew that the 'body' starts here and ends there?
-Original Message-
From: WeiWei Shi
hec.data -array(c(5,15,20,68,29,54,84,119,14,14,17,26,16,10,94,7),
dim=c(4,4),
dimnames=list(eye=c(Green,Hazel, Blue, Brown),
hair=c(Black, Brown, Red, Blond)))
#--
dfr -
Assuming dfr[day,o,h,l,c] and day like 2004-12-28:
dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0
wk - format(dt,%Yw%U)
aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr$c),list(wk),mean)
colnames(aggr) - etc
-Original Message-
From: Omar Lakkis [mailto:[EMAIL PROTECTED]
Sent:
In fact since you have dates and not datetimes use as.Date() instead
of strptime().
On 5/11/05, bogdan romocea wrote:
Assuming dfr[day,o,h,l,c] and day like 2004-12-28:
dt - strptime(as.character(dfr$day),format=%Y-%m-%d) + 0
wk - format(dt,%Yw%U)
aggr - aggregate(list(dfr$o,dfr$h,dfr$l,dfr
Dear useRs,
On a GNU/Linux box I want to run some code from the command line. This works
#!/bin/sh
R --vanilla -q --gui=X11 code.r
however I want the plots to appear in a window (as it happens when the
code is run interactively) instead of being saved in 'Rplots.ps'. Is
that doable?
Thank
You asked another question about clustering, so I presume you want to
standardize some variables before clustering. In SAS, PROC STDIZE
offers 18 standardization methods. See
http://support.sas.com/91doc/getDoc/statug.hlp/stdize_sect12.htm#stat_stdize_stdizesm
for details. If you're really
, 2005 9:39 AM
To: bogdan romocea
Cc: R-help@stat.math.ethz.ch
Subject: RE: [R] R annoyances
On Fri, 20 May 2005, bogdan romocea wrote:
On 20-May-05 Uwe Ligges wrote:
All possible changes to T/F (both removing the meaning of
TRUE/FALSE in a clean session and making them reserved words)
would
1. I faced the same issue and came up with the code below.
2. See rainbow().
allcol - colors()
png(Rcolors.png,width=1100,height=3000)
par(mai=c(0.4,0.5,0.3,0.2),omi=c(0.2,0,0,0),cex.axis=0.1,pch=15,bg=white)
plot(1,1,xlim=c(1,10),ylim=c(1,66),col=allcol[1],cex=4)
You're almost there, use a list:
myfiles - list()
for (i in 1:n) myfiles[[i]] - etc
You can then get at your data frames with myfiles[[1]],
myfiles[[2]]... Or, if you prefer to combine them into a single data
frame (assuming they're similar),
allmyfiles - do.call(rbind,myfiles)
-Original
This is a FAQ, 7.31.
-Original Message-
From: Omar Lakkis [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 25, 2005 10:09 AM
To: r-help@stat.math.ethz.ch
Subject: [R] precision problem
I have prices that I am finding difficult to compare with ==, and ,
due to precision. For example: the
Multiply by 4, round and divide by 4.
a - c(1.15,5.82)
round(a*4,digits=0)/4
-Original Message-
From: Ken Termiso [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 25, 2005 1:27 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Rounding fractional numbers to nearest fraction
Hi all,
I've got
Read this book, Multivariate Statistical Analysis: A Conceptual
Introduction by Sam Kash Kachigan. I think it's *great*, and perfect
for someone without any statistical background.
-Original Message-
From: manav ram [mailto:[EMAIL PROTECTED]
Sent: Friday, May 27, 2005 10:56 AM
To:
You don't say what you want to do with the data, how many columns you
have etc. However, I would suggest proceeding in this order:
1. Avoid R; do everything in MySQL.
2. Use random samples.
3. If for some reason you need to process all 160 million rows in R, do
it in a loop. Pull no more than,
length(unique(userid)) will take
(almost) no time...
So I think the other way round will serve best: Do everything in R and
avoid using SQL on the database...
-Ursprüngliche Nachricht-
Von: bogdan romocea [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 6. Juni 2005 16:27
An: Dubravko Dolic
Cc
file.exists():
if(!file.exists(your.file)) next
Or, try():
your.data - try(as.matrix(whatever))
if (class(your.data) == try-error) {something went wrong / the file
doesn't exist - just for logging, the code will not fail}
-Original Message-
From: Dave Evens [mailto:[EMAIL PROTECTED]
Dear useRs,
Given this code I end up with a list of class by:
a - sample(1:5,200,replace=TRUE)
b - sample(c(v1,v2,v3),200,replace=TRUE)
c - sample(c(11,22,33),200,replace=TRUE)
data - runif(200)
grouped - by(data,list(a,b,c),function(x) {c(min=min(x),max=max(x),
On Tue, 14 Jun 2005, Prof Brian Ripley wrote:
If your file system does not like 15000 files you can always
save in a DBMS.
Or, switch to a better/more appropriate file system:
http://en.wikipedia.org/wiki/Comparison_of_file_systems
ReiserFS would allow you to store up to about 1.2 million
You could use a VB macro in Excel to automate the data export in CSV
format, and it's not complex at all, for example:
Private Sub CommandButton1_Click()
Dim strB18 As String
strB18 = Me.Cells(18, 2)
'MsgBox Export Folder = strB18
On Error GoTo ErrHandler
Sheets(Inputs).SaveAs FileName:= _
Dear useRs,
I timed the same code (simulation with for loops) on the same box
(dual Xeon EM64T, 1.5 Gb RAM) under 3 OSs and was surprised by the
results:
Windows XP Pro (32-bit): Time difference of 5.97 mins
64-bit GNU/Linux (Fedora Core 4): Time difference of 6.97 mins
32-bit
It may be better to do this in SQL. The code below works for an
arbitrary number of IDs and handles missing values.
test - data.frame(id=rep(c(1,2),10),date=sort(c(1:10,1:10)),ret=0.01*-9:10)
idret - list()
ids - sort(unique(test$id))
for (i in ids) {
idret[[as.character(i)]] -
The best 3 things you can do in this situation are:
1. don't use Excel.
2. never use Excel.
3. never ever use Excel again.
Spreadsheets are _not_ databases. In particular, Excel is a time bomb
- use it long enough and you'll get burned (perhaps without even
realizing it). See
Why don't you do the simulations in SAS? If you prefer otherwise,
setup the SAS code for running in batch mode (output and log
redirection), then call it from R with (on Windows, untested)
system(start ' ' C:\etc\sas.exe -sysin garch.sas)
To keep the parameters from the estimate, have the SAS job
Here's one approach.
values - c(rnorm(1000,-5,1),rnorm(1000,10,0.5))
boxplot(values)
text(1,0,labels=better use violin plots,col=red)
#--
require(vioplot)
vioplot(values)
text(1,0,labels=better than box plots,col=red,pos=4)
-Original Message-
From: Keith Sabol [mailto:[EMAIL
Welcome to R. See
?merge
then
?aggregate
or
require(Hmisc)
?summarize
or
?by
You can probably find many examples in the archives, if needed.
-Original Message-
From: Michael Graber [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 25, 2005 3:45 PM
To: R-Mailingliste
Those are obviously days, not seconds. A simple test would have
answered your question:
test - strptime(20051026 15:26:19,format=%Y%m%d %H:%M:%S) -
strptime(20051024 16:23:01,format=%Y%m%d %H:%M:%S)
class(test)
test
cat(test,\n)
If you prefer you can use difftime for conversion:
Assuming you don't end up with too many clusters, you could take the
classification and use it as the target for a tree, random forest,
discriminant analysis or multinomial logistic regression. The random
forest may be the best option.
-Original Message-
From: alessandro carletti
Leaf Sun wrote:
The histogram is highly screwed to the right, say, the range
of the vector is [0, 2], but 95% of the value is squeezed in
the interval (0.01, 0.2).
I guess the histogram is as you wrote. See
http://web.maths.unsw.edu.au/~tduong/seminars/intro2kde/
for a short explanation.
Here's a function that you can customize to fit your needs. lst is a named list.
multicomp - function(lst)
{
clr - c(darkgreen,red,blue,brown,magenta)
alldens - lapply(lst,function(x) {density(x,from=min(x),to=max(x))})
allx - sapply(alldens,function(d) {d$x})
ally - sapply(alldens,function(d)
Don't use assign(), named lists are much better (check the stuff on
indexing lists). Here's an example:
a - list()
a[[one]] - c(1,2,3)
a[[two]] - c(4,5,6)
a[[two]]
do.call(rbind,a)
do.call(cbind,a)
lapply(a,sum)
With regards to your question, did you try printing varname[i] in your
loop to see
What do you need a bunch of functions for? I'm not familiar with the
details of difftime objects, however an easy way out of here is to get
the time difference in seconds, which you can then add or subtract as
you please from date-times.
x-Sys.time(); y-Sys.time()+3600
diff -
What if the distributions are not normal etc? You might want to try a
simulation to get an answer. Draw random samples from each
distribution (without assuming normality etc - one way to do this is
to get the quantiles, then draw a sample of quantiles, then draw a
value from each quantile), throw
Sean Davis wrote:
but you will have to create the table by hand
There's no need for manual steps. To take advantage of MySQL's
extremely fast 'load data infile' you could dump the data in CSV
format, write a script for mysql (the command line tool), for example
q - function(table,infile)
{
That was just an example -- it's not difficult to write an R function
to generate the mysql create table syntax for a data frame with 60 or
600 columns. (BTW, I would never type 67 columns.)
On 12/12/05, Sean Davis [EMAIL PROTECTED] wrote:
On 12/12/05 9:21 AM, bogdan romocea [EMAIL PROTECTED
Are you talking about Rgui on Windows? Use the shortcut, Alt-F-N.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ronnie
Babigumira
Sent: Wednesday, December 28, 2005 9:21 AM
To: R Help
Subject: [R] Open a new script from R command prompt
Hi, (this is a
Here's one approach,
v1 - sample(c(-1,0,1),30,replace=TRUE)
v2 - sample(c(0.05,0,0.1),30,replace=TRUE)
lst - split(v1,v2)
counted - lapply(lst,table)
mat - do.call(rbind,counted)
print(counted)
print(mat)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf
Check the way you imported the data / the SQLite documentation. The
\r\n that you see (you're on Windows, right?) is used to indicate the
end of the data lines in the source file - \r is a carriage return,
and \n is a new line character.
-Original Message-
From: [EMAIL PROTECTED]
In fact it's just as easy in Internet Explorer: right-click + Open in
New Window, or Shift-Click, followed by Ctrl+D. Or, right-click + Add
to Favorites.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Charles Annis, P.E.
Sent: Monday, January 02,
Your 2-million loop is overkill, because apparently in the (vast)
majority of cases you don't need to loop at all. You could try
something like this:
1. Split the price by id, e.g.
price.list - split(price,id)
For each id,
2a. When price is not NA, assign it to next price _without_ using a
for
ronggui wrote:
If i am familiar with
database software, using database (and R) is the best choice,but
convert the file into database format is not an easy job for me.
Good working knowledge of a DBMS is almost invaluable when it comes to
working with very large data sets. In addition, learning
Peter Muhlberger wrote:
But, there is a second point here, which is how difficult it
was for me [...] to find what seem to me like standard key
features I've taken for granted in other packages.
There is another side to this. Don't consider only how difficult it
was to find what you were
Dear useRs,
I got stuck trying to generate a palette of topographic colors that
would satisfy these two requirements:
- the pallete must be 'anchored' at 0 (just like on a map), with
light blue/lawn green corresponding to data values close to 0 (dark
blue to light blue for negative values,
See
http://en.wikipedia.org/wiki/Levenshtein_distance
http://thread.gmane.org/gmane.comp.lang.r.general/31499
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Werner
Wernersen
Sent: Tuesday, January 10, 2006 2:00 PM
To: Gabor Grothendieck
Cc:
By the way, you might find this sed one-liner useful:
sed -n '11981q;11970,11980p' filename.txt
It will print the offending line and its neighbors. If you're on
Windows you need to install Windows Services For Unix or Cygwin.
-Original Message-
From: [EMAIL PROTECTED]
With regards to your first question, here's a function I used a couple
of times to get plots similar to those you're looking for. (Search the
list for how to find the source code. Also, there's a reference other
than MASS on the ?rpart page.)
#bogdan romocea 2006-06
#adapted source code from
Forget about assign() Co. Search R-help for 'assign', read the
documentation on lists, and realize that it's quite a lot better to
use lists for this kind of stuff.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Scionforbai
Sent: Wednesday, October
I haven't seen the first book (DAAG) mentioned so far, I have it and
think it's very good. Anyway, I recommend you buy all R books (and
perhaps take some extra time off to study them): your employer can
well afford that, given the cash you're saving by not using
proprietary software.
This was asked before. Collapse the data frame into a vector, e.g.
v - apply(DF,1,function(x) {paste(x,collapse=_)})
then work with the values of that vector (table, unique etc). If your
data frame is really large run this in a DBMS.
-Original Message-
From: [EMAIL PROTECTED]
What is it that you don't know how to do? Loop over the matrices from
the 2 lists and merge them two by two, for example
AB - list() ; id - 1
for (i in 1:length(A)) for (j in 1:length(B)) {
AB[[id]] - merge(A[[i]],B[[j]],...)
id - id + 1
}
To better keep track of who's who, you may want to
Does any one know of comparisons of the Pentium 9x0, Pentium(r)
Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 64
FX/Dual Core AM2 and similar chips when used for this kind of work.
I think your best option, by far, is to answer the question on your
own. Put R and your programs on
Read up on the discrete Fourier transform:
http://en.wikipedia.org/wiki/Discrete_Fourier_transform
http://en.wikipedia.org/wiki/Frequency_spectrum#Spectrum_analysis
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Randy Zelick
Sent: Tuesday, December
If you're on Windows switch to
http://www.copernic.com/en/products/desktop-search/index.html ,
last time I looked it was quite a lot better than Google Desktop Search.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Farrel
Buchinsky
Sent: Wednesday,
Nevermind the CPU usage, the likely problem is that your queries are
inefficient in one or more ways (i.e., you don't use indexes when you
really should - it's impossible to guess without knowing how the data
and the queries look like, which somehow you've decided are not
important enough to
Dear useRs,
I have a few hundred plots that I'd like to export to one document.
pdf() isn't an option, because the file created is prohibitively huge
(due to scatter plots with many points). So I have to use png()
instead, but then I end up with a lot of files (would prefer just
one).
1. Is
Not sure about R, but for a Perl example check
http://yosucker.sourceforge.net/ .
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tudor Bodea
Sent: Monday, January 08, 2007 11:53 AM
To: r-help@stat.math.ethz.ch
Cc: Tudor Bodea
Subject: [R] Access,
Hello, I don't understand the behavior of apply() on the data frame below.
test -
structure(list(Date = structure(c(13361, 13361, 13361, 13361,
13361, 13361, 13361, 13361, 13362, 13362, 13362, 13362, 13362,
13362, 13362, 13362, 13363, 13363, 13363, 13363, 13363, 13363,
13363, 13363, 13364, 13364,
One option for processing very large files with R is split:
## split a large file into pieces
#--parameters: the folder, file and number of parts
FLD=/home/user/data
F=very_large_file.dat
parts=50
#---split
cd $FLD
fn=`echo $F | awk -F\. '{print $1}'` #file name without extension
days - seq(as.Date(1970/1/1), as.Date(2003/12/31), days)
temp - rnorm(length(days), mean=10, sd=8)
tapply(temp, format(days,%Y-%m), mean)
tapply(temp, format(days,%b), mean)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Majid Iravani
Sent:
The problem with your code is that it doesn't check for errors. See
?try, ?tryCatch. For example:
my.download - function(forloop) {
notok - vector()
for (i in forloop) {
cdaily - try(blpGetData(...))
if (class(cdaily) == try-error) {
notok - c(notok, i)
} else {
See ?cut for continuous variables, and ?factor, ?levels for the others.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack
Sent: Tuesday, March 06, 2007 12:49 PM
To: R-help@stat.math.ethz.ch
Subject: [R] R and SAS proc format
Dear all,
This is a bad idea as it can greatly slow things down (the details
were discussed several times on this list). What you want to do is
define from the start the length of your vector/list, then grow it (by
a large margin) only if it becomes full.
lst - vector(mode=list, length=10) #assuming
(1)Institutions (not only academia) using R
http://www.r-project.org/useR-2006/participants.html
(2)Hardware requirements, possibly benchmarks
Since you mention huge data sets, GNU/Linux running on 64-bit machines
with as much RAM as your budget allows.
(3)R clusters, R multiple CPU
I find it easier to install all the packages again:
#---run in previous version
packages - installed.packages()[,Package]
save(packages, file=Rpackages)
#---run in new version
load(Rpackages)
for (p in setdiff(packages, installed.packages()[,Package]))
install.packages(p)
-Original
With regards to your concern - export the R object to a MySQL table
(the RMySQL documentation tells you how), then run an inner join. Or
if the table to query isn't that big, pull it in R and subset it with
%in%. You could use system.time() to see which runs faster.
-Original Message-
Don't rush to buy new hardware yet (other than perhaps more RAM for
your existing desktop). First of all you should make sure that your R
code can't be made any faster. (I've seen cases where careful
re-writes increased speed by a factor of 10 or more.) There are some
rules (such as pre-allocate
Here's one way,
lapply(split(DF, your.vector), function(x) {apply(x, 2, sum)})
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Daniel O'Shea
Sent: Tuesday, August 21, 2007 3:53 PM
To: r-help@stat.math.ethz.ch
Subject: [R] summing columns of data
On a related note, there's one other amazingly stupid thing that Excel
(2002 SP3) does - it exports to CSV the numbers as you see them
displayed, and not as they were entered/imported in the first place.
For example, 1.2345678 will be exported to CSV/tab delimited as 1.23
if that column is
101 - 167 of 167 matches
Mail list logo