On a related note, there's one other amazingly stupid thing that Excel
(2002 SP3) does - it exports to CSV the numbers as you see them
displayed, and not as they were entered/imported in the first place.
For example, 1.2345678 will be exported to CSV/tab delimited as 1.23
if that column is formatte
Here's one way,
lapply(split(DF, your.vector), function(x) {apply(x, 2, sum)})
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Daniel O'Shea
> Sent: Tuesday, August 21, 2007 3:53 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] summing columns of da
Don't rush to buy new hardware yet (other than perhaps more RAM for
your existing desktop). First of all you should make sure that your R
code can't be made any faster. (I've seen cases where careful
re-writes increased speed by a factor of 10 or more.) There are some
rules (such as pre-allocate en
With regards to your concern - export the R object to a MySQL table
(the RMySQL documentation tells you how), then run an inner join. Or
if the table to query isn't that big, pull it in R and subset it with
%in%. You could use system.time() to see which runs faster.
> -Original Message-
>
I find it easier to install all the packages again:
#---run in previous version
packages <- installed.packages()[,"Package"]
save(packages, file="Rpackages")
#---run in new version
load("Rpackages")
for (p in setdiff(packages, installed.packages()[,"Package"]))
install.packages(p)
> -Origin
> (1)Institutions (not only academia) using R
http://www.r-project.org/useR-2006/participants.html
> (2)Hardware requirements, possibly benchmarks
Since you mention huge data sets, GNU/Linux running on 64-bit machines
with as much RAM as your budget allows.
> (3)R & clusters, R & multiple CPU m
This is a bad idea as it can greatly slow things down (the details
were discussed several times on this list). What you want to do is
define from the start the length of your vector/list, then grow it (by
a large margin) only if it becomes full.
lst <- vector(mode="list", length=10) #assuming
See ?cut for continuous variables, and ?factor, ?levels for the others.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of lamack lamack
> Sent: Tuesday, March 06, 2007 12:49 PM
> To: R-help@stat.math.ethz.ch
> Subject: [R] R and SAS proc format
>
> De
The problem with your code is that it doesn't check for errors. See
?try, ?tryCatch. For example:
my.download <- function(forloop) {
notok <- vector()
for (i in forloop) {
cdaily <- try(blpGetData(...))
if (class(cdaily) == "try-error") {
notok <- c(notok, i)
} else {
#
days <- seq(as.Date("1970/1/1"), as.Date("2003/12/31"), "days")
temp <- rnorm(length(days), mean=10, sd=8)
tapply(temp, format(days,"%Y-%m"), mean)
tapply(temp, format(days,"%b"), mean)
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Majid Iravani
>
One option for processing very large files with R is split:
## split a large file into pieces
#--parameters: the folder, file and number of parts
FLD=/home/user/data
F=very_large_file.dat
parts=50
#---split
cd $FLD
fn=`echo $F | awk -F\. '{print $1}'` #file name without extension
Hello, I don't understand the behavior of apply() on the data frame below.
test <-
structure(list(Date = structure(c(13361, 13361, 13361, 13361,
13361, 13361, 13361, 13361, 13362, 13362, 13362, 13362, 13362,
13362, 13362, 13362, 13363, 13363, 13363, 13363, 13363, 13363,
13363, 13363, 13364, 13364,
Not sure about R, but for a Perl example check
http://yosucker.sourceforge.net/ .
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Tudor Bodea
> Sent: Monday, January 08, 2007 11:53 AM
> To: r-help@stat.math.ethz.ch
> Cc: Tudor Bodea
> Subject: [R] A
Dear useRs,
I have a few hundred plots that I'd like to export to one document.
pdf() isn't an option, because the file created is prohibitively huge
(due to scatter plots with many points). So I have to use png()
instead, but then I end up with a lot of files (would prefer just
one).
1. Is there
Nevermind the CPU usage, the likely problem is that your queries are
inefficient in one or more ways (i.e., you don't use indexes when you
really should - it's impossible to guess without knowing how the data
and the queries look like, which somehow you've decided are not
important enough to descri
If you're on Windows switch to
http://www.copernic.com/en/products/desktop-search/index.html ,
last time I looked it was quite a lot better than Google Desktop Search.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Farrel
> Buchinsky
> Sent: Wednes
Read up on the discrete Fourier transform:
http://en.wikipedia.org/wiki/Discrete_Fourier_transform
http://en.wikipedia.org/wiki/Frequency_spectrum#Spectrum_analysis
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Randy Zelick
> Sent: Tuesday, Decemb
> Does any one know of comparisons of the Pentium 9x0, Pentium(r)
> Extreme/Core 2 Duo, AMD(r) Athlon(r) 64 , AMD(r) Athlon(r) 64
> FX/Dual Core AM2 and similar chips when used for this kind of work.
I think your best option, by far, is to answer the question on your
own. Put R and your programs o
What is it that you don't know how to do? Loop over the matrices from
the 2 lists and merge them two by two, for example
AB <- list() ; id <- 1
for (i in 1:length(A)) for (j in 1:length(B)) {
AB[[id]] <- merge(A[[i]],B[[j]],...)
id <- id + 1
}
To better keep track of who's who, you may want t
This was asked before. Collapse the data frame into a vector, e.g.
v <- apply(DF,1,function(x) {paste(x,collapse="_")})
then work with the values of that vector (table, unique etc). If your
data frame is really large run this in a DBMS.
> -Original Message-
> From: [EMAIL PROTECTED]
> [ma
I haven't seen the first book (DAAG) mentioned so far, I have it and
think it's very good. Anyway, I recommend you buy all R books (and
perhaps take some extra time off to study them): your employer can
well afford that, given the cash you're saving by not using
proprietary software.
> -Origi
Forget about assign() & Co. Search R-help for 'assign', read the
documentation on lists, and realize that it's quite a lot better to
use lists for this kind of stuff.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Scionforbai
> Sent: Wednesday, Oct
With regards to your first question, here's a function I used a couple
of times to get plots similar to those you're looking for. (Search the
list for how to find the source code. Also, there's a reference other
than MASS on the ?rpart page.)
#bogdan romocea 2006-06
#adapted s
A function I've been using for a while returned a surprising [to me,
given the data] error recently:
Error in plot.window(xlim, ylim, log, asp, ...) :
Logarithmic axis must have positive limits
After some digging I realized what was going on:
x <- c(10460.97, 10808.67, 29499.98, 1, 35818
One obvious alternative is an SQL join, which you could do directly in
a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating
indexes on user/userid before the join may save a lot of time.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf
You forgot to mention your OS. This was asked before and if I recall
correctly the answer for Windows was no. An acceptable solution (imho)
is to edit the Rprofile.site files and add something like
pngplotwidth <- 990 ; pngplotheight <- 700
pdfplotwidth <- 14 ; pdfplotheight <- 10
Then, use the
A simple function will do what you want, customize this as needed:
lprint <- function(lst,prefix)
{
for (i in 1:length(lst)) {
cat(paste(prefix,"$",names(lst)[i],sep=""),"\n")
print(lst[[i]])
cat("\n")
}
}
P <- list(A="a",B="b")
lprint(P,"Prefix")
> -Original Message-
> From: [EM
Dear useRs,
I'd like to produce some scatter plots where N units on the X axis are
equal to N units on the Y axis (as measured with a ruler, on screen or
paper). This approach
x <- sample(10:200,40) ; y <- sample(20:100,40)
windows(width=max(x),height=max(y))
plot(x,y)
is better than plot(x,
By far, the cheapest and easiest solution (and the very first to try)
is to add more memory. The cost depends on what kind you need, but
here's for example 2 GB you can buy for only $150:
http://www.newegg.com/Product/Product.asp?Item=N82E16820144157
Project constraints?! If they don't want to spe
It's possible and straightforward (just don't use R). IMHO the GNU
Core Utilities
http://www.gnu.org/software/coreutils/
plus a few other tools such as sed, awk, grep etc are much more
appropriate than R for processing massive text files. (Get a good book
about UNIX shell scripting. On Windows you
One option is
library(R2HTML)
?HTML.cormat
The thing you're after is traffic highlighting (via CSS or HTML tags).
If HTML.cormat() doesn't do exactly what you want, modify the source
code. (By the way, I haven't used R2HTML so far so maybe there's a
more appropriate function.)
> -Original Mes
Not sure about your data set, but if you have some kind of
(weighted/stratified) sample of hospitals you need to pay special
attention. Survey data violates the assumptions of the classical
linear models (infinite population, identically distributed errors
etc) and needs to be analyzed differently.
t; y <- c(y,colnames(a))
> z <- c(z,a[i,])
> }
> symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg="khaki")
> text(as.numeric(x),as.numeric(y),labels=z)
>
> > symbols(as.numeric(x),as.numeric(y),z,inches=0.2,bg="khaki")
> Error in plot.window(xlim, ylim
Here's an example. By the way, I find that it's more convenient (where
applicable) to keep the data in 3 vectors/factors rather than one
matrix/data frame.
a <- matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
x <- y <- z <- vector()
for (i in 1:nrow(a)) {
x <- c(x,rep(row
I wouldn't use a DBMS at all -- it is not necessary and I don't see
what you would get in return. Instead I would split very large log
files into a number of pieces so that each piece fits in memory (see
below for an example), then process them in a loop. See the list and
the documentation if you h
Compare
system.time({
v <- vector()
for (i in 1:10^5) v <- c(v,1)
})
with
system.time({
v <- vector(length=10^5)
for (i in 1:10^5) v[i] <- 1
})
If you don't know exactly how long v will be, use a value that's large
enough, then throw away what's extra.
> -Original Message-
Macro stuff à la SAS is something that should be avoided whenever
possible - it's messy, limited, and limiting. (I've done it
ocasionally and it works, but I think it's best not to go there.) Read
the documentation on lists (in particular named lists), and keep
everything in one or more lists. For
Repeated merge()-ing does not always increase the space requirements
linearly. Keep in mind that a join between two tables where the same
value appears M and N times will produce M*N rows for that particular
value. My guess is that the number of rows in atot explodes because
you have some duplicate
Your approach seems very inefficient - it looks like you're executing
thousands of update statements. Try something like this instead:
#---build a table 'updates' (id and value)
...
#---do all updates via a single left join
UPDATE bigtable a LEFT JOIN updates b
ON a.id = b.id
SET a.col1 = b.value;
> I'll see if I can reproduce the steps under Knoppix[1]. Then you can
> run Knoppix with a Persistent Disk Image (PDI)[2] that contains R,
> the DBI, and RMySQL on just about any machine that runs Knoppix.
Don't bother, it's been done already. See
http://dirk.eddelbuettel.com/quantian.html
> -
This goes the other way - all SQL manipulations are a subset of what
can be done with R. Read up on indexing and see ?merge, ?aggregate,
?by, ?tapply, among others. (For the R equivalent to your query, check
?grep and ?order, and search the list if needed.) Also, this example
might be a good start:
Here's an example.
dfr <- data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50)
vars <- colnames(dfr)
for (v in vars[grep("B",vars)]) print(mean(dfr[,v]))
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Farrel
> Buchinsky
> Sent: Wednesday, May 03, 2006 10
plot(1:10,axes=FALSE)
axis(1,at=1:10,labels=10:1)
axis(2,at=1:10,labels=5*10:1)
box()
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Christopher Brown
> Sent: Tuesday, May 02, 2006 12:13 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] Axis label
Another good option is SQL, the fastest and most scalable solution. If
you decide to give it a try pay close attention to indexes.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller
> Sent: Monday, May 01, 2006 8:55 AM
> To: 'Guojun Zhu';
I agree it would be worthwhile to make some cosmetic changes to
r-project.org (nothing fancy though - no javascript, Flash etc). The
general public may not be fully aware of how R compares to other
statistical software, and I doubt that a web site which looks like it
was put together 10 years ago h
There is an aspect, worthy of careful consideration, you don't seem to
be aware of. I'll ask the question for you: How does the
explanatory/predictive potential of a dataset vary as the dataset gets
larger and larger?
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECT
Here's an example.
lst <- list()
for (i in 1:5) {
lst[[i]] <- data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE))
colnames(lst[[i]])[2] <- paste("x",i,sep="")
}
dfr <- lst[[1]]
for (i in 2:length(lst)) dfr <- merge(dfr,lst[[i]],all=TRUE)
dfr <- dfr[order(dfr[,1]),]
print(dfr)
> --
Forget about R for now and port the application to MySQL/PostgreSQL
etc, it is possible and worthwhile. In case you happen to use (and
really need) some SAS DATA STEP looping features you might be forced
to look into SQL cursors, otherwise the port should be (very)
straightforward.
> -Origina
Installing R on SuSE 10.0 may be less than trivial for a beginner (I
ended up compiling GCC plus 3-4 other things). In case you lose your
patience I'd suggest trying Mepis Linux: it's very easy to install and
the package management GUI (Synaptic) is great. Installing R together
with a bunch of R pa
There are several kinds of standardization, and 'normalization' is
only one of them. For some details you could check
http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm
(see Details for standardization methods).
Standardization is required prior to clustering to control for the
impact
Apparently you do not understand the point, and seem to (want to) see
patterns all over the place. A good start for the treatment of this
interesting disease is 'Fooled by Randomness' by Nassim Nicholas
Taleb. The main point of the book is that many things may be a lot
more random than one might ca
Adapt the function below to suit your needs. If you really want to
plot 5 minutes at a time, round the time series to the last MM:00
times (where MM is in 5*0:11) and have idx below loop over them.
splitplot <- function(x,points)
{
boundaries <- c(1,points*1:floor(length(x)/points),length(x))
for
?assign, but _don't_ use it; lists are better.
dfr <- list()
for(j in 1:9) {
dfr[[as.character(j)]] <- ...
}
Don't try to imitate the limited macro approach of other software
(e.g. SAS). You can do all that in R, but it's much simpler and much
safer to rely on list indexing and functions that r
\r is a carriage return character which some editors may use as a line
terminator when writing files. My guess is that RSQLite writes your
data frame to a temp file using \r as a line terminator and then runs
a script to have SQLite import the data (together with \r - this would
be the problem), b
For a general solution without warnings try
interleave <- function(v1,v2)
{
ord1 <- 2*(1:length(v1))-1
ord2 <- 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
interleave(rep(1,5),rep(3,8))
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Gabor
> Gro
Here's one way,
x <- data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10))
y <- data.frame(V=c(2,9,10))
xy <- merge(x,y,all=FALSE)
Pay close attention to what happens if you have duplicate values in y, say
y <- data.frame(V=c(2,9,10,10))
> -Original Message-
> From: [EMAIL PROTECTED
t1 <- as.data.frame(table(1:10)) ; colnames(t1)[2] <- "A"
t2 <- as.data.frame(table(5:20)) ; colnames(t2)[2] <- "B"
t3 <- merge(t1,t2,all=TRUE)
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante
> Sent: Tuesday, February 07, 2006 4:22 PM
> T
Here's another approach which can be easily implemented in SQL.
1. Start with the dates as character vectors,
dt <- as.character(Sys.time())
2. Extract the minutes and round them to 0,15,30,45:
minutes <- floor(as.numeric(substr(dt,15,16))/15)*15
final.mins <- as.character(minutes)
fina
By the way, you might find this sed one-liner useful:
sed -n '11981q;11970,11980p' filename.txt
It will print the offending line and its neighbors. If you're on
Windows you need to install Windows Services For Unix or Cygwin.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EM
See
http://en.wikipedia.org/wiki/Levenshtein_distance
http://thread.gmane.org/gmane.comp.lang.r.general/31499
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Werner
> Wernersen
> Sent: Tuesday, January 10, 2006 2:00 PM
> To: Gabor Grothendieck
> Cc:
Dear useRs,
I got stuck trying to generate a palette of topographic colors that
would satisfy these two requirements:
- the pallete must be 'anchored' at 0 (just like on a map), with
light blue/lawn green corresponding to data values close to 0 (dark
blue to light blue for negative values, gree
See this thread,
https://stat.ethz.ch/pipermail/r-sig-finance/2005q4/000568.html
You could also have R query a real-time database (via RMySQL, ROracle
etc) every few seconds.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Mark Leeds
> Sent: Friday,
Peter Muhlberger wrote:
> But, there is a second point here, which is how difficult it
> was for me [...] to find what seem to me like standard & key
> features I've taken for granted in other packages.
There is another side to this. Don't consider only how difficult it
was to find what you were l
ronggui wrote:
> If i am familiar with
> database software, using database (and R) is the best choice,but
> convert the file into database format is not an easy job for me.
Good working knowledge of a DBMS is almost invaluable when it comes to
working with very large data sets. In addition, learni
Your 2-million loop is overkill, because apparently in the (vast)
majority of cases you don't need to loop at all. You could try
something like this:
1. Split the price by id, e.g.
price.list <- split(price,id)
For each id,
2a. When price is not NA, assign it to next price _without_ using a
for loo
In fact it's just as easy in Internet Explorer: right-click + Open in
New Window, or Shift-Click, followed by Ctrl+D. Or, right-click + Add
to Favorites.
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Charles Annis, P.E.
> Sent: Monday, January 0
Check the way you imported the data / the SQLite documentation. The
\r\n that you see (you're on Windows, right?) is used to indicate the
end of the data lines in the source file - \r is a carriage return,
and \n is a new line character.
> -Original Message-
> From: [EMAIL PROTECTED]
> [m
Here's one approach,
v1 <- sample(c(-1,0,1),30,replace=TRUE)
v2 <- sample(c(0.05,0,0.1),30,replace=TRUE)
lst <- split(v1,v2)
counted <- lapply(lst,table)
mat <- do.call("rbind",counted)
print(counted)
print(mat)
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On
Are you talking about Rgui on Windows? Use the shortcut, Alt-F-N.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ronnie
Babigumira
Sent: Wednesday, December 28, 2005 9:21 AM
To: R Help
Subject: [R] Open a new script from R command prompt
Hi, (this is a
That was just an example -- it's not difficult to write an R function
to generate the mysql create table syntax for a data frame with 60 or
600 columns. (BTW, I would never type 67 columns.)
On 12/12/05, Sean Davis <[EMAIL PROTECTED]> wrote:
>
>
>
> On 12/12/05 9:21 AM, &
> Sean Davis wrote:
> but you will have to create the table by hand
There's no need for manual steps. To take advantage of MySQL's
extremely fast 'load data infile' you could dump the data in CSV
format, write a script for mysql (the command line tool), for example
q <- function(table,infile)
{
q
What if the distributions are not normal etc? You might want to try a
simulation to get an answer. Draw random samples from each
distribution (without assuming normality etc - one way to do this is
to get the quantiles, then draw a sample of quantiles, then draw a
value from each quantile), throw t
What do you need a bunch of functions for? I'm not familiar with the
details of difftime objects, however an easy way out of here is to get
the time difference in seconds, which you can then add or subtract as
you please from date-times.
x<-Sys.time(); y<-Sys.time()+3600
diff <- as.numeric(difftim
Don't use assign(), named lists are much better (check the stuff on
indexing lists). Here's an example:
a <- list()
a[["one"]] <- c(1,2,3)
a[["two"]] <- c(4,5,6)
a[["two"]]
do.call("rbind",a)
do.call("cbind",a)
lapply(a,sum)
With regards to your question, did you try printing varname[i] in your
lo
Here's a function that you can customize to fit your needs. lst is a named list.
multicomp <- function(lst)
{
clr <- c("darkgreen","red","blue","brown","magenta")
alldens <- lapply(lst,function(x) {density(x,from=min(x),to=max(x))})
allx <- sapply(alldens,function(d) {d$x})
ally <- sapply(alldens,
> > Leaf Sun wrote:
> > The histogram is highly screwed to the right, say, the range
> > of the vector is [0, 2], but 95% of the value is squeezed in
> > the interval (0.01, 0.2).
I guess the histogram is as you wrote. See
http://web.maths.unsw.edu.au/~tduong/seminars/intro2kde/
for a short explan
Assuming you don't end up with too many clusters, you could take the
classification and use it as the target for a tree, random forest,
discriminant analysis or multinomial logistic regression. The random
forest may be the best option.
> -Original Message-
> From: alessandro carletti [mai
Those are obviously days, not seconds. A simple test would have
answered your question:
test <- strptime("20051026 15:26:19",format="%Y%m%d %H:%M:%S") -
strptime("20051024 16:23:01",format="%Y%m%d %H:%M:%S")
class(test)
test
cat(test,"\n")
If you prefer you can use difftime for conversion:
dif
Welcome to R. See
?merge
then
?aggregate
or
require(Hmisc)
?summarize
or
?by
You can probably find many examples in the archives, if needed.
> -Original Message-
> From: Michael Graber [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, October 25, 2005 3:45 PM
> To: R-Mailingliste
Here's one approach.
values <- c(rnorm(1000,-5,1),rnorm(1000,10,0.5))
boxplot(values)
text(1,0,labels="better use violin plots",col="red")
#--
require(vioplot)
vioplot(values)
text(1,0,labels="better than box plots",col="red",pos=4)
> -Original Message-
> From: Keith Sabol [mailto:[E
Simple addition and subtraction works as well:
as.Date("1995/12/01",format="%Y/%m/%d") + 30
If you have datetime values you can use
strptime("1995-12-01 08:00:00",format="%Y-%m-%d %H:%M:%S") + 30*24*3600
where 30*24*3600 = 30 days expressed in seconds.
> -Original Message-
> From: Mar
Nevermind, I found the fix. Declaring the length for out eliminates
the performance decrease,
out <- vector(mode="numeric",length=length(test))
On 10/10/05, bogdan romocea <[EMAIL PROTECTED]> wrote:
> Dear useRs,
>
> I'm wondering why the for() loop below run
Dear useRs,
I'm wondering why the for() loop below runs slower as it progresses.
On a Win XP box, the iterations at the beginning run much faster than
those at the end:
1%, iteration 2000, 10:10:16
2%, iteration 4000, 10:10:17
3%, iteration 6000, 10:10:17
98%, iteration 196000, 10:24:04
99%, itera
Dear useRs,
Is there a way to 'properly' format %d when plotting more than one
page on png()? 'Properly' means to me with leading 0s, so that the
PNGs become easy to navigate in a file/image browser. Lacking a better
solution I ended up using the code below, but would much prefer
something like
A related comment - don't rely (too much) on boxplots. They show only
a few things, which may be limiting in many cases and completely
misleading in others. Here are a couple of suggestions for plots which
you may find more useful than the standard box plots:
- figure 3.27 from
http://www.
Dear useRs,
I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0
and MySQL 4.1.11-2 installed through yum). After an initial
configuration error ("could not find the MySQL installation include
and/or library directories") I managed to install RMySQL with
# export PKG_LIBS="-L -
I don't understand why there's so much discussion on PowerPoint. IMHO,
that can only obscure the real thing:
- The Perils of Miscommunication
- The Perils of Not Taking Responsibility (if PowerPoint is to blame
for X, then who's to blame for choosing and using PowerPoint in the
firs
Most powerful in what way? Quite a lot depends on the jobs you're going to run.
- To run CPU-bound jobs, more CPUs is better. (Even though R doesn't
do threading, you can manually split some CPU-bound jobs in several
parts and run them simultaneously.) Apart from multiple CPUs and
hyperthre
One solution is
test <- c("1.11","10.11","11.11","113.31","114.2","114.3")
id <- unlist(lapply(strsplit(test,"[.]"),function(x) {x[2]}))
> -Original Message-
> From: Bernd Weiss [mailto:[EMAIL PROTECTED]
> Sent: Thursday, August 18, 2005 12:10 PM
> To: r-help@stat.math.ethz.ch
> Subject
This appears to be an SQL issue. Look for a way to speed up your
queries in Postgresql. I presume you haven't created an index on
'index', which means that every time you run your SELECT, Postgresql
is forced to do a full table scan (not good). If the index doesn't
solve the problem, look for some
The first one is an index, not a data set. Anyway, just use SAS to
export the data sets in text format (CSV, tab-delimited etc). You can
then easily read those in R. (By the way, the help for read.xport says
that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT
file? Hint: no.)
> ---
You need the day to convert to a date format. Assuming day=15:
x.date <- as.Date(paste(as.character(x),"-15",sep=""),format="%Y-%m-%d")
> -Original Message-
> From: alessandro carletti [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, August 10, 2005 9:37 AM
> To: rHELP
> Subject: [R] date fo
There's something else you could try - since you can't hide the code,
obfuscate it. Hide the real thing in a large pile of useless,
complicated, awfully formatted code that would stop anyone except the
most desperate (including yourself, after a couple of weeks/months)
from trying to understand it.
If happenat is not a datetime value, convert it with strptime(). Then,
one solution is to transform it in the following way:
num.time <- as.numeric(format(happenat,"%Y%m%d%H%M%S"))
This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset
your data frame with
dfr[which(num.time >= 2005
and use it for the whole R session. (I never close the
connection after a query.)
hth,
b.
> -Original Message-
> From: Thieme, Lutz [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 22, 2005 2:04 AM
> To: bogdan romocea
> Cc: R-help@stat.math.ethz.ch
> Subject: Re: [R] Rprof fa
I think you're barking up the wrong tree. Optimize the MySQL code
separately from optimizing the R code. A very nice reference about the
former is http://highperformancemysql.com/. Also, if possible, do
everything in MySQL.
hth,
b.
> -Original Message-
> From: Thieme, Lutz [mailto:[EMAIL
So your conclusion is that the only choice is to make mistakes and get
in trouble. (That's what Excel excels at.)
Two options I haven't seen mentioned are:
1. Create your deliverables in HTML format, and change the extension
from .htm to .xls; Excel will import them automatically. The way the
file
How about avoiding SAS XPORT altogether and exporting everything in
the simple, clean, non-proprietary, extremely reliable,
platform-independent ... etc text format (CSV, tab delimited etc)?
> -Original Message-
> From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED]
> Sent: Thursday, July
Why don't you do the simulations in SAS? If you prefer otherwise,
setup the SAS code for running in batch mode (output and log
redirection), then call it from R with (on Windows, untested)
system("start ' ' C:\etc\sas.exe -sysin garch.sas")
To keep the parameters from the estimate, have the SAS jo
The best 3 things you can do in this situation are:
1. don't use Excel.
2. never use Excel.
3. never ever use Excel again.
Spreadsheets are _not_ databases. In particular, Excel is a time bomb
- use it long enough and you'll get burned (perhaps without even
realizing it). See
http://www.burns-stat
1 - 100 of 173 matches
Mail list logo