How about avoiding SAS XPORT altogether and exporting everything in
the simple, clean, non-proprietary, extremely reliable,
platform-independent ... etc text format (CSV, tab delimited etc)?
-Original Message-
From: Nelson, Gary (FWE) [mailto:[EMAIL PROTECTED]
Sent: Thursday, July
So your conclusion is that the only choice is to make mistakes and get
in trouble. (That's what Excel excels at.)
Two options I haven't seen mentioned are:
1. Create your deliverables in HTML format, and change the extension
from .htm to .xls; Excel will import them automatically. The way the
I think you're barking up the wrong tree. Optimize the MySQL code
separately from optimizing the R code. A very nice reference about the
former is http://highperformancemysql.com/. Also, if possible, do
everything in MySQL.
hth,
b.
-Original Message-
From: Thieme, Lutz [mailto:[EMAIL
never close the
connection after a query.)
hth,
b.
-Original Message-
From: Thieme, Lutz [mailto:[EMAIL PROTECTED]
Sent: Friday, July 22, 2005 2:04 AM
To: bogdan romocea
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] Rprof fails in combination with RMySQL
Hello Bogdan
If happenat is not a datetime value, convert it with strptime(). Then,
one solution is to transform it in the following way:
num.time - as.numeric(format(happenat,%Y%m%d%H%M%S))
This way, 07/22/05 00:05:14 becomes 20050722000514, and you can subset
your data frame with
dfr[which(num.time =
There's something else you could try - since you can't hide the code,
obfuscate it. Hide the real thing in a large pile of useless,
complicated, awfully formatted code that would stop anyone except the
most desperate (including yourself, after a couple of weeks/months)
from trying to understand
You need the day to convert to a date format. Assuming day=15:
x.date - as.Date(paste(as.character(x),-15,sep=),format=%Y-%m-%d)
-Original Message-
From: alessandro carletti [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 10, 2005 9:37 AM
To: rHELP
Subject: [R] date format
The first one is an index, not a data set. Anyway, just use SAS to
export the data sets in text format (CSV, tab-delimited etc). You can
then easily read those in R. (By the way, the help for read.xport says
that 'The file must be in SAS XPORT format.' Is .sas7bdat an XPORT
file? Hint: no.)
This appears to be an SQL issue. Look for a way to speed up your
queries in Postgresql. I presume you haven't created an index on
'index', which means that every time you run your SELECT, Postgresql
is forced to do a full table scan (not good). If the index doesn't
solve the problem, look for some
One solution is
test - c(1.11,10.11,11.11,113.31,114.2,114.3)
id - unlist(lapply(strsplit(test,[.]),function(x) {x[2]}))
-Original Message-
From: Bernd Weiss [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 12:10 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Regular
Most powerful in what way? Quite a lot depends on the jobs you're going to run.
- To run CPU-bound jobs, more CPUs is better. (Even though R doesn't
do threading, you can manually split some CPU-bound jobs in several
parts and run them simultaneously.) Apart from multiple CPUs and
Dear useRs,
I'm having a hard time installing RMySQL on a FC4 x86_64 box (R 2.1.0
and MySQL 4.1.11-2 installed through yum). After an initial
configuration error (could not find the MySQL installation include
and/or library directories) I managed to install RMySQL with
# export
A related comment - don't rely (too much) on boxplots. They show only
a few things, which may be limiting in many cases and completely
misleading in others. Here are a couple of suggestions for plots which
you may find more useful than the standard box plots:
- figure 3.27 from
Dear useRs,
Is there a way to 'properly' format %d when plotting more than one
page on png()? 'Properly' means to me with leading 0s, so that the
PNGs become easy to navigate in a file/image browser. Lacking a better
solution I ended up using the code below, but would much prefer
something like
Dear useRs,
I'm wondering why the for() loop below runs slower as it progresses.
On a Win XP box, the iterations at the beginning run much faster than
those at the end:
1%, iteration 2000, 10:10:16
2%, iteration 4000, 10:10:17
3%, iteration 6000, 10:10:17
98%, iteration 196000, 10:24:04
99%,
Nevermind, I found the fix. Declaring the length for out eliminates
the performance decrease,
out - vector(mode=numeric,length=length(test))
On 10/10/05, bogdan romocea [EMAIL PROTECTED] wrote:
Dear useRs,
I'm wondering why the for() loop below runs slower as it progresses.
On a Win XP
Simple addition and subtraction works as well:
as.Date(1995/12/01,format=%Y/%m/%d) + 30
If you have datetime values you can use
strptime(1995-12-01 08:00:00,format=%Y-%m-%d %H:%M:%S) + 30*24*3600
where 30*24*3600 = 30 days expressed in seconds.
-Original Message-
From: Marc
By far, the cheapest and easiest solution (and the very first to try)
is to add more memory. The cost depends on what kind you need, but
here's for example 2 GB you can buy for only $150:
http://www.newegg.com/Product/Product.asp?Item=N82E16820144157
Project constraints?! If they don't want to
Dear useRs,
I'd like to produce some scatter plots where N units on the X axis are
equal to N units on the Y axis (as measured with a ruler, on screen or
paper). This approach
x - sample(10:200,40) ; y - sample(20:100,40)
windows(width=max(x),height=max(y))
plot(x,y)
is better than
A simple function will do what you want, customize this as needed:
lprint - function(lst,prefix)
{
for (i in 1:length(lst)) {
cat(paste(prefix,$,names(lst)[i],sep=),\n)
print(lst[[i]])
cat(\n)
}
}
P - list(A=a,B=b)
lprint(P,Prefix)
-Original Message-
From: [EMAIL PROTECTED]
You forgot to mention your OS. This was asked before and if I recall
correctly the answer for Windows was no. An acceptable solution (imho)
is to edit the Rprofile.site files and add something like
pngplotwidth - 990 ; pngplotheight - 700
pdfplotwidth - 14 ; pdfplotheight - 10
Then, use these
One obvious alternative is an SQL join, which you could do directly in
a DBMS, or from R via RMySQL / RSQLite /... Keep in mind that creating
indexes on user/userid before the join may save a lot of time.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf
A function I've been using for a while returned a surprising [to me,
given the data] error recently:
Error in plot.window(xlim, ylim, log, asp, ...) :
Logarithmic axis must have positive limits
After some digging I realized what was going on:
x - c(10460.97, 10808.67, 29499.98, 1,
Dear R users,
I have a column with dates (character) in a data frame:
12-Jan-01 11-Jan-01 10-Jan-01 9-Jan-01 8-Jan-01 5-Jan-01
and I need to convert them to (Julian) dates so that I can
sort the whole data frame by date. I thought it would be
very simple, but after checking the documentation
Thank you everyone. Indeed, I had read the data via
read.csv and the date column was a factor. Everything works
fine if I convert to character first.
Regards,
b.
--- Sundar Dorai-Raj [EMAIL PROTECTED] wrote:
bogdan romocea wrote:
Dear R users,
I have a column with dates
Dear R users,
I have a function (below) which encompasses several tests.
However, when I run it, only the output of the last test is
displayed. How can I ensure that the function root(var)
will run and display the output from all tests, and not
just the last one?
Thank you,
b.
root -
Dear R users,
I need to fit an ARMA model. As far as I've seen, EACF (extended ACF)
is not available in R.
1. Let's say I fit a series of ARMA models in a loop. Given the
code/output included below, how do I pull 'Model' and 'Fit' (AIC)
from each summary() so that I can combine them into an
Dear R users,
I'm having a hard time with some very simple things. I have a time
series where the dates are in the format 7-Oct-04. I imported the
file with read.csv so the date column is a factor. The series is
rather long and I want to plot it piece by piece. The function below
works fine,
=deparse(substitute(varb)), type=o)
}
}
--- Prof Brian Ripley [EMAIL PROTECTED] wrote:
On Mon, 1 Nov 2004, bogdan romocea wrote:
Dear R users,
I'm having a hard time with some very simple things. I have a
time
series where the dates are in the format 7-Oct
Dear R users,
I have a data frame which I create with read.csv and then order by
date:
d - na.omit(read.csv(...))
d - d[order(as.Date(as.character(d$Date), format=%d-%b-%y),
decreasing=F, na.last=F),]
My problem is that even though the data frame is ordered as
requested, the old row
Assuming you have enough data, usually 1/4 to 1/2 is used for
validation.
One reference would be
Picard, R.R. and Berk, K.N. (1990)
Data Splitting, The American Statistician, 44;140-147.
hth,
b.
-Original Message-
From: Wensui Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 11,
Dear R users,
This is a KDE beginner's question.
I have this distribution:
length(cap)
[1] 200
summary(cap)
Min. 1st Qu. MedianMean 3rd Qu.Max.
459.9 802.3 991.6 1066.0 1242.0 2382.0
I need to compute the sum of the values times their probability of
occurence.
The graph
.
Could you tell us exactly what you are trying to compute, or why
you're
computing it?
HTH,
Andy
From: bogdan romocea
Dear R users,
This is a KDE beginner's question.
I have this distribution:
length(cap)
[1] 200
summary(cap)
Min. 1st Qu. MedianMean 3rd Qu
Better install and run R from a USB flash drive. This will save you
the trouble of re-writing the CD as you upgrade and install new
packages. Also, you can simply copy the R installation on your work
computer (no install rights needed); R will run.
HTH,
b.
From: Hans van Walen
neela v writes:
Hi all there
Can some one clarify me on this issue, features wise which is
better R or SAS, leaving the commerical aspect associated with it. I
suppose there are few people who have worked on both R and SAS and
wish they would be able to help me in deciding on this.
THank
You may be missing something. After you create all those objects,
you'll want to use them. Use get():
for (i in 1:10) ... get(paste(object,i,sep=)) ...
It took me about a week to find out how to do this. I waited for a
few days, but before I got to ask this basic/rtfm question, someone
else -
I'm also an R beginner. I have asked stupid questions, and received
RTFM replies. I believe such replies are _GREAT_, as long as they
include a brief reference to what to read, and where. (In some cases
searches don't work unless you happen to use the 'right' keywords,
and in other cases it may be
Here's something that works. I'm sure there are better solutions (in
particular the paste part - I couldn't figure out how to avoid typing
a[i,1], ..., a[i,10]).
a - matrix(nrow=1000,ncol=10)
for (i in 1:1000)
for (j in 1:10)
a[i,j] - sample(1:0,1)
b -
Dear R users,
I need to rename the columns of a series of data frames. The names of
the data frames and those of the columns need to be pulled from some
vectors. I tried a couple of things but only got errors. What am I
missing?
#---create data frame
dframes - c(a,b,c)
Before choosing a GNU/Linux distribution look into the package
management issue.
http://distrowatch.com/
I would suggest that you avoid all RPM-based distributions (Mandrake,
Fedora, SuSE), and consider Debian (+ those based on it) the
source-based distributions (such as Gentoo). I've been using
A simple for loop does the job. Why not write your own function?
movsd - function(series,lag)
{
movingsd - vector(mode=numeric)
for (i in lag:length(series))
{
movingsd[i] - sd(series[(i-lag+1):i])
}
assign(movingsd,movingsd,.GlobalEnv)
}
This is very efficient: it takes
I asked the same question a few weeks ago. See
http://tolstoy.newcastle.edu.au/R/help/04/11/6775.html
-Original Message-
From: Martin Wegmann
Sent: Tuesday, December 14, 2004 6:23 AM
To: [EMAIL PROTECTED]
Subject: [R] sort() leaves row names unaffected
Hello,
I wonder if I ran into a
Not sure if it's the best way, but you could do it this way:
all.results - vector(mode=numeric)
for (i in 1:100)
{
...
this.run - ...
all.results - c(all.results,this.run)
}
At this point all.results contains the values of this.run from the
whole loop. If
Dear R users,
I have a data frame with a few thousand rows and several hundred
numeric columns (plus a date column). For each row (day), I want to
assign +/- 1 to the highest X absolute values, 0 to the other values,
and save all that in a separate data frame.
I have a working solution (below),
Save the command(s) in a batch (.bat) file, and then run the .bat
file from the task scheduler.
-Original Message-
From: Mikkel Grum [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 21, 2004 3:18 PM
To: RHelp
Subject: [R] scheduling R tasks under windows
I'm trying to schedule R tasks
See
http://www.statsoft.com/textbook/stdisfit.html
There are several approaches you can use - Chi-square, Q-Q plots, P-P
plots, various tests (Kolmogorov-Smirnov, Shapiro-Wilks' W) etc.
HTH,
b.
-Original Message-
From: Angela Re
Sent: Wednesday, December 22, 2004 9:13 AM
To: [EMAIL
Dear R users,
I'm interested in a combination of a scatterplot and an image graph.
I have two large vectors. Because in the scatterplot some areas are
sparsely and others densely populated, I want to see the points, and
I also want their color to be changed based on their density (similar
to a
Dear useRs,
When I use coplot() and output to png/jpeg/bmp, the grid lines from
the scatter plots disappear. If I output to pdf() the grid lines are
there, however I can't use it - I have many points, and the resulting
PDF file is large and very slow to open and scroll through. (By the
way, if I
This is a rather complex problem. I'm not aware of an R function /
package that can do something like this, but in case you need to build
it from scratch read
http://support.sas.com/documentation/periodicals/obs/obswww15/index.html
If you're familiar with SAS you could translate the code to R.
Dear useRs,
I have a function that creates several global objects with
assign(obj,obj,.GlobalEnv), and which I need to run iteratively in
another function. The code is similar to
f - function(...) {
assign(obj,obj,.GlobalEnv)
}
fct - function(...) {
for (i in 1:1000)
{
...
Apparently the message below wasn't posted on R-help, so I'm sending it
again. Sorry if you received it twice.
--- bogdan romocea [EMAIL PROTECTED] wrote:
Date: Tue, 11 Jan 2005 17:31:42 -0800 (PST)
From: bogdan romocea [EMAIL PROTECTED]
Subject: Re: [R] global objects not overwritten within
It appears you wouldn't get much improvement at all even if the 2nd CPU
were used at 100%. Five R sessions can easily overwhelm one CPU. I
think you need (a lot) more CPUs than 2 to solve your problem.
Possible solutions:
1. Install R on each eMac. Since you have 40 of them, you might want to
put
Here's a different suggestion. Create a bunch of image files, and then
use an image browser (GQview is one of the best; if you're on Win look
at ACDSee) to view them as a slide show. Good image browsers read
images in advance and should not produce flickering. I haven't
experimented though with
Dear useRs,
I have a script (Python) that every once in a while appends data to a
MySQL table. Meanwhile, I have a running R session, and I want it to be
aware of such table updates. I could write a loop in R to periodically
check whether new data has become available; however, are you aware of
a
Dear useRs,
How come the first attempt to sort a POSIXt vector fails (Error:
non-atomic type in greater), while the second succeeds? (Code inserted
below.) The documentation says that POSIXt is used to allow operations
such as subtraction, so I'd expect sorting to work. Is this perhaps an
OS
Dear useRs,
I'm trying to download some data through the HTTPS protocol. However,
download.file() does not support HTTPS (R 2.0.1 on WinXP):
Error in download.file(https.url, destfile = test.txt) :
unsupported URL scheme
1. Is there any other function/package in R that can work with
Dear useRs,
I have an empirical distribution (not normal etc) and I want to draw
random samples from it. One solution I can think of is to compute let's
say 100 quantiles, then use runif() to draw a random number Q between 1
and 100, and finally run runif() again to pull a random value from the
I'm not sure I understand.
You have financial data and want to throw away some outliers??
Why would you ever do this?
First of all, I'd suggest you pay close attention to what the data is
trying to say. Maybe your distribution is not normal after all (see
tests for normality etc). Maybe you
Dear useRs,
I have a simple/RTFM question about XML parsing. Given an XML file,
such as (fragment)
A100/A
B23/B
Ctrue/C
how do I import it in a data frame or list, so that the values (100,
23, true) can be accessed through the names A, B and C?
I installed the XML package and looked over the
I managed to parse more complex XML files as well. The trick was to
manually determine the position of the child nodes of interest, after
which they can be parsed in a loop. For example:
require(XML)
doc - xmlTreeParse(file.xml,getDTD=T,addAttributeNamespaces=T)
r - xmlRoot(doc)
#find the nodes
I managed to install R 2.0.1 on Mandrake 10.1 a couple of weeks ago. It
wasn't that easy, first I had to manually track, download and install
3-4 dependencies.
I would suggest that you consider another GNU/Linux distribution,
Mepis. Mepis combines the best features of several distributions:
--- Rau, Roland [EMAIL PROTECTED] wrote:
-Original Message-
From: r-help
On Behalf Of bogdan romocea
Sent: Tuesday, March 15, 2005 2:49 PM
I would suggest that you consider another GNU/Linux distribution,
I don't think it is necessary. Mandrake 10.1 is fine for
running R
1. No way. You must have MySQL installed on your computer.
In fact this is not true. You can use a MySQL server installed
somewhere else on the network.
--- bogdan romocea [EMAIL PROTECTED] wrote:
1. No way. You must have MySQL installed on your computer.
2. You must install the server
(max.con = 16, fetch.default.rec = 5000, force.reload = F)
drv - dbDriver(MySQL)
con - dbConnect(drv,username=userid,password=pswd,dbname=db)
dbListTables(con)
--- Uwe Ligges [EMAIL PROTECTED] wrote:
bogdan romocea wrote:
1. No way. You must have MySQL installed on your computer.
2. You
In regards to your plot question, you could use points() or lines():
a - sample(1:50,10)
b - sample(20:40,10)
plot(1:10,a,pch=20,col=red)
points(1:10,b,pch=20,col=blue)
#or
#lines(1:10,b,pch=20,col=blue,type=o)
-Original Message-
From: Mohammad Ehsanul Karim [mailto:[EMAIL PROTECTED]
You can also buy these things on Ebay. I noticed the supply about 2
months ago when I guess you would have made about $1-2 per invitation.
The profit opportunity is much diminished now that the supply has
greatly increased (it appears every gmail account was allocated 50
invitations instead of 5 a
Dear useRs,
I want to simulate a time series (stationary; the distribution of
values is skewed to the right; quite a few ARMA absolute standardized
residuals above 2 - about 8% of them). Is this the right way to do it?
#
load(rdtb)#the time series
dfr - data.frame(sample(1:50,10),sample(1:50,10))
colnames(dfr) - c(a,b)
dfr - dfr[order(dfr$a),]
dfr - dfr[order(-dfr$a),]
-Original Message-
From: Mario Morales [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 31, 2005 10:23 PM
To: r-help@stat.math.ethz.ch
Subject: [R] a R function for
You need another OS. Standard/32-bit Windows (XP, 2000 etc) can't use
more than 4 GB of RAM. Anyway, if you try to buy a box with 16 GB of
RAM, the seller will probably warn you about Windows and recommend a
suitable OS.
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL
Dear useRs,
I have a data frame and I want to plot all rows. Each row is
represented as a line that links the values in each column. The plot
looks like this:
dfr - data.frame(A=sample(1:50,10),B=sample(1:50,10),
C=sample(1:50,10),D=sample(1:50,10))
xa - 10*1:4
plot(c(10,40),c(0,50))
for
Forget about R for now and port the application to MySQL/PostgreSQL
etc, it is possible and worthwhile. In case you happen to use (and
really need) some SAS DATA STEP looping features you might be forced
to look into SQL cursors, otherwise the port should be (very)
straightforward.
Here's an example.
lst - list()
for (i in 1:5) {
lst[[i]] - data.frame(v=sample(1:20,10),sample(1:5,10,replace=TRUE))
colnames(lst[[i]])[2] - paste(x,i,sep=)
}
dfr - lst[[1]]
for (i in 2:length(lst)) dfr - merge(dfr,lst[[i]],all=TRUE)
dfr - dfr[order(dfr[,1]),]
print(dfr)
There is an aspect, worthy of careful consideration, you don't seem to
be aware of. I'll ask the question for you: How does the
explanatory/predictive potential of a dataset vary as the dataset gets
larger and larger?
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL
I agree it would be worthwhile to make some cosmetic changes to
r-project.org (nothing fancy though - no javascript, Flash etc). The
general public may not be fully aware of how R compares to other
statistical software, and I doubt that a web site which looks like it
was put together 10 years ago
Another good option is SQL, the fastest and most scalable solution. If
you decide to give it a try pay close attention to indexes.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve Miller
Sent: Monday, May 01, 2006 8:55 AM
To: 'Guojun Zhu';
plot(1:10,axes=FALSE)
axis(1,at=1:10,labels=10:1)
axis(2,at=1:10,labels=5*10:1)
box()
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Christopher Brown
Sent: Tuesday, May 02, 2006 12:13 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Axis labels
I
Here's an example.
dfr - data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50)
vars - colnames(dfr)
for (v in vars[grep(B,vars)]) print(mean(dfr[,v]))
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Farrel
Buchinsky
Sent: Wednesday, May 03, 2006 10:46 AM
This goes the other way - all SQL manipulations are a subset of what
can be done with R. Read up on indexing and see ?merge, ?aggregate,
?by, ?tapply, among others. (For the R equivalent to your query, check
?grep and ?order, and search the list if needed.) Also, this example
might be a good
I'll see if I can reproduce the steps under Knoppix[1]. Then you can
run Knoppix with a Persistent Disk Image (PDI)[2] that contains R,
the DBI, and RMySQL on just about any machine that runs Knoppix.
Don't bother, it's been done already. See
http://dirk.eddelbuettel.com/quantian.html
Your approach seems very inefficient - it looks like you're executing
thousands of update statements. Try something like this instead:
#---build a table 'updates' (id and value)
...
#---do all updates via a single left join
UPDATE bigtable a LEFT JOIN updates b
ON a.id = b.id
SET a.col1 = b.value;
Repeated merge()-ing does not always increase the space requirements
linearly. Keep in mind that a join between two tables where the same
value appears M and N times will produce M*N rows for that particular
value. My guess is that the number of rows in atot explodes because
you have some
Macro stuff à la SAS is something that should be avoided whenever
possible - it's messy, limited, and limiting. (I've done it
ocasionally and it works, but I think it's best not to go there.) Read
the documentation on lists (in particular named lists), and keep
everything in one or more lists. For
Compare
system.time({
v - vector()
for (i in 1:10^5) v - c(v,1)
})
with
system.time({
v - vector(length=10^5)
for (i in 1:10^5) v[i] - 1
})
If you don't know exactly how long v will be, use a value that's large
enough, then throw away what's extra.
-Original Message-
I wouldn't use a DBMS at all -- it is not necessary and I don't see
what you would get in return. Instead I would split very large log
files into a number of pieces so that each piece fits in memory (see
below for an example), then process them in a loop. See the list and
the documentation if you
Here's an example. By the way, I find that it's more convenient (where
applicable) to keep the data in 3 vectors/factors rather than one
matrix/data frame.
a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
x - y - z - vector()
for (i in 1:nrow(a)) {
x -
-14 at 16:47 -0400, bogdan romocea wrote:
Here's an example. By the way, I find that it's more convenient (where
applicable) to keep the data in 3 vectors/factors rather than one
matrix/data frame.
a - matrix(sample(1:5,100,replace=TRUE),nrow=10,dimnames=list(1:10,5*1:10))
x - y - z
Not sure about your data set, but if you have some kind of
(weighted/stratified) sample of hospitals you need to pay special
attention. Survey data violates the assumptions of the classical
linear models (infinite population, identically distributed errors
etc) and needs to be analyzed
One option is
library(R2HTML)
?HTML.cormat
The thing you're after is traffic highlighting (via CSS or HTML tags).
If HTML.cormat() doesn't do exactly what you want, modify the source
code. (By the way, I haven't used R2HTML so far so maybe there's a
more appropriate function.)
-Original
It's possible and straightforward (just don't use R). IMHO the GNU
Core Utilities
http://www.gnu.org/software/coreutils/
plus a few other tools such as sed, awk, grep etc are much more
appropriate than R for processing massive text files. (Get a good book
about UNIX shell scripting. On Windows you
Here's another approach which can be easily implemented in SQL.
1. Start with the dates as character vectors,
dt - as.character(Sys.time())
2. Extract the minutes and round them to 0,15,30,45:
minutes - floor(as.numeric(substr(dt,15,16))/15)*15
final.mins - as.character(minutes)
t1 - as.data.frame(table(1:10)) ; colnames(t1)[2] - A
t2 - as.data.frame(table(5:20)) ; colnames(t2)[2] - B
t3 - merge(t1,t2,all=TRUE)
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eric Pante
Sent: Tuesday, February 07, 2006 4:22 PM
To:
Here's one way,
x - data.frame(V=c(1,1,1,1,2,2,4,4,4,9,10,10,10,10,10))
y - data.frame(V=c(2,9,10))
xy - merge(x,y,all=FALSE)
Pay close attention to what happens if you have duplicate values in y, say
y - data.frame(V=c(2,9,10,10))
-Original Message-
From: [EMAIL PROTECTED]
For a general solution without warnings try
interleave - function(v1,v2)
{
ord1 - 2*(1:length(v1))-1
ord2 - 2*(1:length(v2))
c(v1,v2)[order(c(ord1,ord2))]
}
interleave(rep(1,5),rep(3,8))
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Gabor
\r is a carriage return character which some editors may use as a line
terminator when writing files. My guess is that RSQLite writes your
data frame to a temp file using \r as a line terminator and then runs
a script to have SQLite import the data (together with \r - this would
be the problem),
?assign, but _don't_ use it; lists are better.
dfr - list()
for(j in 1:9) {
dfr[[as.character(j)]] - ...
}
Don't try to imitate the limited macro approach of other software
(e.g. SAS). You can do all that in R, but it's much simpler and much
safer to rely on list indexing and functions that
Adapt the function below to suit your needs. If you really want to
plot 5 minutes at a time, round the time series to the last MM:00
times (where MM is in 5*0:11) and have idx below loop over them.
splitplot - function(x,points)
{
boundaries - c(1,points*1:floor(length(x)/points),length(x))
for
Apparently you do not understand the point, and seem to (want to) see
patterns all over the place. A good start for the treatment of this
interesting disease is 'Fooled by Randomness' by Nassim Nicholas
Taleb. The main point of the book is that many things may be a lot
more random than one might
There are several kinds of standardization, and 'normalization' is
only one of them. For some details you could check
http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm
(see Details for standardization methods).
Standardization is required prior to clustering to control for the
Installing R on SuSE 10.0 may be less than trivial for a beginner (I
ended up compiling GCC plus 3-4 other things). In case you lose your
patience I'd suggest trying Mepis Linux: it's very easy to install and
the package management GUI (Synaptic) is great. Installing R together
with a bunch of R
I am looking for an answer to a similar question - a generalized
solution that would be able to apply
(1) any number of functions
(2) to any number of vectors
(3) by any number of factors
(just like SQL's group by).
The output data frame must contain the values of the by factors, to be
1 - 100 of 167 matches
Mail list logo