Richard,
Try data.table. See the introduction vignette and the
presentations e.g. there is a slide showing a join to
183,000,000 observations of daily stock prices in
0.002 seconds.
data.table has fast rolling joins (i.e. fast last observation
carried forward) too. I see you asked about that on
Try data.table with the roll=TRUE argument.
Set your keys and then write :
futData[optData,roll=TRUE]
That is fast and as you can see, short. Works on
many millions and even billions of rows in R.
Matthew
http://datatable.r-forge.r-project.org/
Santosh Srinivas
since you are on 64bit. I was working on the basis of squeezing into 32bit.
Matthew
Matthew Dowle mdo...@mdowle.plus.com wrote in message
news:i1faj2$lv...@dough.gmane.org...
Hi Juliet,
Thanks for the info.
It is very slow because of the == in testData[testData$V2==one_ind,]
Why? Imagine
Is this what you mean?
x=c(1,2,2,3,4,5,6,3,2,1)
y=c(2,3,4,2,1,2,3,4,5,6)
matplot(cbind(x,y),type=l)
which(diff(sign(x-y))!=0)+1
[1] 4 8
--
View this message in context:
http://r.789695.n4.nabble.com/Finding-points-where-two-timeseries-cross-over-tp2313257p2313510.html
Sent from the R help
Another option for consideration :
library(data.table)
mydt = as.data.table(mydf)
mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac]
fac X.Intercept. x1 x2x3
[1,] 0 -0.16247059 1.130220 2.988769 -19.14719
[2,] 1 0.08224509 1.216673 2.847960 -19.16105
[3,] 2
To: r-help
Cc: Jeff, Matt, Duncan, Hadley [ using Nabble to cc ]
Jeff, Matt,
How about the 'refdata' class in package ref.
Also, Hadley's immutable data.frame in plyr 1.1.
Both allow you to refer to subsets of a data.frame or matrix by reference I
believe, if I understand correctly.
All the solutions in this thread so far use the lapply(split(...)) paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few 'out of memory' errors and performance issues in R.
data.table doesn't do that internally, and it's syntax is pretty
+ep8ubu3mxxhhrd...@mail.gmail.com...
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com
wrote:
All the solutions in this thread so far use the lapply(split(...))
paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few
Wiley wrote:
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com
wrote:
All the solutions in this thread so far use the lapply(split(...))
paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few 'out of memory' errors
Or try data.table 1.4 on r-forge, its grouping is faster than aggregate :
agg datatable
X100.012 0.008
X100 0.020 0.008
X1000 0.172 0.020
X1 1.164 0.144
X1e.05 9.397 1.180
install.packages(data.table, repos=http://R-Forge.R-project.org;)
I don't know about that, but try this :
install.packages(data.table, repos=http://R-Forge.R-project.org;)
require(data.table)
summaries = data.table(summaries)
summaries[,sum(counts),by=symbol]
Please let us know if that returns the correct result, and if its
memory/speed is ok ?
Matthew
Steve Lianoglou mailinglist.honey...@gmail.com wrote in message
news:t2ybbdc7ed01004290812n433515b5vb15b49c170f5a...@mail.gmail.com...
Thanks for directing me to the data.table package. I read through some
of the vignettes, and it looks quite nice.
While your sample code would provide
data.table is an enhanced data.frame with fast subset, fast
grouping and fast merge. It uses a short and flexible syntax
which extends existing R concepts.
Example:
DT[a3,sum(b*c),by=d]
where DT is a data.table with 4 columns (a,b,c,d).
data.table 1.4.1 :
* grouping is now 10+ times faster
Thanks for suggesting data.table. It does have advantages in this example
but it has to be used in a particular way.
What does Peng actually want to achieve? I'll guess (but its only a guess)
that he doesn't actually need to hold the entire table in memory in a split
up format before doing
or if Dataset is a data.table :
Dataset = data.table(Dataset)
Dataset[,abs(ratio-median(ratio)),by=LEAID]
LEAIDV1
[1,] 6307 0.0911905
[2,] 6307 0.0488095
[3,] 6307 0.0488095
[4,] 6307 0.1088095
[5,] 8300 0.2021538
[6,] 8300 0.000
[7,] 8300 0.060
rather than :
Maybe this (with enough data for a CI) ? :
Dataset = data.table(Dataset)
Dataset[,as.list(wilcox.test(ratio,conf.int=TRUE)$conf.int),by=LEAID]
LEAID V1 V2
[1,] 6307 0.720 0.92
[2,] 8300 0.5678462 0.83
Warning messages:
1: In switch(alternative, two.sided = {
what I think is an estimated interval.
I really want to use the above formula. I just can't figure out how to
get it to run by the LEAID.
It does require 9 observations to produce an interval, but I was showing a
sample.
Thanks again.
L.A.
Matthew Dowle-3 wrote:
Maybe this (with enough
That makes eight solutions. Any others? :)
A ninth was detailed in two other threads last month. The first link
compares to ave().
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html
http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html
Dennis Murphy djmu...@gmail.com wrote in
and more
convenient (and therefore quicker) to write, debug and maintain.
Matthew Dowle mdo...@mdowle.plus.com wrote in message
news:hgnjev$3h...@ger.gmane.org...
or if Dataset is a data.table :
Dataset = data.table(Dataset)
Dataset[,abs(ratio-median(ratio)),by=LEAID]
LEAIDV1
[1
Or if there is a requirement for speed or shorter more convenient syntax
then there is a data.table join.
Basically setkey(data1,V1,V2) and setkey(data2,V1,V2), then data1[data2]
does the merge very quickly. You probably then want to do something with the
merged data set, which you just add
As can data.table (i.e. do 'having' in one statement) :
DT = data.table(DF)
DT[,list(n=length(NAME),mean(SCORE)),by=NAME][n==3]
NAME n V2
[1,] James 3 64.0
[2,] Tom 3 78.7
but data.table isn't restricted to SQL functions (such as avg), any R
functions can be used,
special print
control codes would mess things up. I just recently received a new laptop
computer, and now I have an occassional problem with Word's pretty print
quotes, but if you know about that problem, it is easy to fix.
Jerry Floren
Minnesota Department of Agriculture
Matthew Dowle-3
William,
Try a rolling join in data.table, something like this (untested) :
setkey(Data, UnitID, TranDt)# sort by unit then date
previous = transform(Data, TranDt=TranDt-1)
Data[previous,roll=TRUE]# lookup the prevailing date before, if any,
for each row within that row's UnitID
dt = data.table(d,key=grp1,grp2)
system.time(ans1 - dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)])
user system elapsed
3.890.003.91# your 7.064 is 12.23 for me though, so this
3.9 should be faster for you
However, Rprof() shows that 3.9 is mostly dispatch of mean to
Hi Ted,
Well since you mentioned data.table (!) ...
If risk_input is a data.table consisting of 3 columns (m_id, sale_date,
return_date) where the dates
are of class IDate (recently added to data.table by Tom) then try :
risk_input[, fitdistr(return_date-sale_date,normal), by=list(m_id,
Hi Juliet,
Thanks for the info.
It is very slow because of the == in testData[testData$V2==one_ind,]
Why? Imagine someoone looks for 10 people in the phone directory. Would
they search the entire phone directory for the first person's phone number,
starting
on page 1, looking at every single
Just to comment on this bit :
For one thing, you cannot index a csv file or a data.frame. If you have to
repeatedly select subsets of your large data set, creating an index on the
relevant column in the sqlite table is an absolute life saver.
This is one reason the data.table package was
The user wrote in their first post :
I have a lot of observations in my dataset
Heres one way to do it with a data.table :
a=data.table(a)
ans = a[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3]
class(ans$dt) = Date
Timings are below comparing the 3 methods. In this
Sounds like a good idea. Would it be possible to give an example of how to
combine plyr with data.table, and why that is better than a data.table only
solution ?
hadley wickham h.wick...@gmail.com wrote in message
news:f8e6ff051001200624r2175e38xf558dc8fa3fb6...@mail.gmail.com...
Note that in
...
On Wed, Jan 20, 2010 at 8:43 AM, Matthew Dowle mdo...@mdowle.plus.com
wrote:
Sounds like a good idea. Would it be possible to give an example of how
to
combine plyr with data.table, and why that is better than a data.table
only
solution ?
Well, ideally, you'd do:
adt - data.table
One way is :
dataset = data.table(ssfamed)
dataset[, whatever some functions are on Asfc, Smc, epLsar, etc ,
by=SPECSHOR,BONE]
Your SPECSHOR and BONE names will be in your result alongside the results of
the whatever ...
Or try package plyr which does this sort of thing too. And sqldf may
but I have thousands of results so it would be really hand to find away of
doing this quickly
its a little difficult to follow those examples
Given your data in data.frame DF, maybe add the following to your list to
investigate :
dat = data.table(DF)
dat[, cor(Score1,Score2),
Please re-read the posting guide e.g. you didn't provide an example data set
or a way to generate one, or any R version information.
Werner W. pensterfuz...@yahoo.de wrote in message
news:646146.32238...@web23002.mail.ird.yahoo.com...
Hi,
I have browsed the help list and looked at the FAQ
?merge
plyr
data.table
sqldf
crantastic
Dr. Viviana Menzel vivianamen...@gmx.de wrote in message
news:4b58a0e9.3050...@gmx.de...
Hello R-help group,
I have a question about merging lists. I have two lists:
Genes list (hSgenes)
namechrstrandstartendtransStarttransEnd
specific
function), but don't worry I won't forget. As you said It only works if
users contribute to it. That makes the power of R!
Ivan
Le 1/21/2010 19:01, Matthew Dowle a écrit :
One way is :
dataset = data.table(ssfamed)
dataset[, whatever some functions are on Asfc, Smc, epLsar, etc
Fantastic. You're much more likely to get a response now. Best of luck.
werner w pensterfuz...@yahoo.de wrote in message
news:1264175935970-1100164.p...@n4.nabble.com...
Thanks Matthew, you are absolutely right.
I am working on Windows XP SP2 32bit with R versions 2.9.1.
Here is an
:18, Matthew Dowle a écrit :
Great.
If you mean the crantastic r package, sorry I wasn't clear, I meant the
crantastic website http://crantastic.org/.
If you meant the description of plyr then if the description looks useful
then click the link taking you to the package documentation and read
:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com...
On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com
wrote:
How many columns, and of what type are the columns ? As Olga asked too, it
would be useful to know more about what you're really trying to do.
3.5m rows is not actually
should not be important as long as
you can do what you want. SQL is declarative so you just specify what
you want rather than how to get it and invisibly to the user it
automatically draws up a query plan and then uses that plan to get the
result.
On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle
and use is to hide the implementation and focus on the problem.
That is why we use high level languages, object orientation, etc.
On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com
wrote:
How it represents data internally is very important, depending on the real
goal :
http
its even faster.
On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com
wrote:
Are you claiming that SQL is that utopia? SQL is a row store. It cannot
give the user the benefits of column store.
For example, why does SQL take 113 seconds in the example in this thread :
http
Yes.
data.df[,wcol,drop=FALSE]
For an explanation of drop see ?[.data.frame
Chuck White chuckwhi...@charter.net wrote in message
news:20100202212800.o8xbu.681696.r...@mp11...
Additional clarification: the problem only comes when you have one column
selected from the original dataframe. You
I agree with Jim. The term do analysis is almost meaningless, the posting
guide makes reference to statements such as that. At least he tried to
define large, but inconsistenly (first of all 850MB, then changed to
10-20-15GB).
Satish wrote: at one time I will need to load say 15GB into R
I can't help you further than whats already been posted to you. Maybe
someone else can.
Best of luck.
Satish Vadlamani satish.vadlam...@fritolay.com wrote in message
news:1265397089104-1470667.p...@n4.nabble.com...
Matthew:
If it is going to help, here is the explanation. I have an end state
Hi,
We have the error below. Any ideas ?
Regards, Matt
$ R --vanilla -d gdb
GNU gdb 6.7.1-debian
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute
Or if you need it to be fast, try data.table. X[Y] is a join when X and Y
are both data.tables. X[Y] is a left join, Y[X] is a right join. 'nomatch'
controls the inner/outer join i.e. what happens for unmatched rows. This
is much faster than merge().
Gabor Grothendieck
If the question really meant to say data.table (i.e. package
data.table) then its easier than the data.frame answer.
dt =
data.table(Categ=c(468,351,0,234,117),Perc=c(31.52,27.52,0.77,22.55,15.99))
dt[order(Categ)]
Notice there is no dt$ required before dt$Categ. Also note the comma is
Dear all,
The data.table package was released back in August 2008. This email is to
publicise its existence in response to several suggestions to do so. It
seems I didn't send a general announcement about it at the time and
therefore perhaps, not surprisingly, not many people know about it.
Dear r-help,
If you haven't already seen this then :
http://www.youtube.com/watch?v=rvT8XThGA8o
The video consists of typing at the console and graphics, there is no audio
or slides. Please press the HD button and maximise. Its about 8 mins.
Regards, Matthew
I'd go a bit further and remind that the r-help posting guide is clear :
For questions about functions in standard packages distributed with R
(see the FAQ Add-on packages in R), ask questions on R-help.
If the question relates to a contributed package , e.g., one downloaded from
CRAN, try
appear to be correct. Or just directly sending an email to all of you?
Thanks again,
Rob
On Wed, Mar 3, 2010 at 6:05 AM, Matthew Dowle
mdo...@mdowle.plus.comwrote:
I'd go a bit further and remind that the r-help posting guide is clear :
For questions about functions in standard packages
Dieter,
One way to check if a package is active, is by looking on r-forge. If you
are referring to data.table you would have found it is actually very active
at the moment and is far from abandoned.
What you may be referring to is a warning, not an error, with v1.2 on
R2.10+. That was fixed
This post breaks the posting guide in multiple ways. Please read it again
(and then again) - in particular the first 3 paragraphs. You will help
yourself by following it.
The solution is right there in the help page for ?data.frame and other
places including Introduction to R. I think its
Frank, I respect your views but I agree with Gabor. The posting guide does
not support your views.
It is not any of our views that are important but we are following the
posting guide. It covers affiliation. It says only that some consider it
good manners to include a concise signature
)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Matthew Dowle mdo...@mdowle.plus.com 3/5/2010 12:58 PM
Frank, I respect your views but I agree with Gabor. The posting guide
does
not support your views.
It is not any of our
Thanks for making it quickly reproducible - I was able to see that message
in English within a few seconds.
The start has x=86, but the data is also called x. Remove x=86 from start
and you get a different error.
P.S. - please do include the R version information. It saves time for us,
and we
Welcome to R Barbara. Its quite an incredible community from all walks of
life.
Your beginner questions are answered in the manual. See Introduction to R.
Please read the posting guide again because it contains lots of good advice
for you. Some people read it three times before posting
Your choice of subject line alone shows some people that you missed some
small details from the posting guide. The ability to notice small details
may be important for you to demonstrate in future. Any answer in this
thread is unlikely to be found by a topic search on subject lines alone
This list is the wrong place for that question. The posting guide tells
you, in bold, to contact the package maintainer first.
If you had already done that, and didn't hear back from him, then you
should tell us, so that we know you followed the guide.
Corey Sparks corey.spa...@utsa.edu
Ricardo,
I see you got no public answer so far, on either of the two lists you posted
to at the same time yesterday. You are therefore unlikely to ever get a
reply.
I also see you've been having trouble getting answers in the past, back to
Nov 09, at least. For example no reply to Credit
Here are some references. Please read these first and post again if you are
still stuck after reading them. If you do post again, we will need x and y.
1. Introduction to R : 9.2.1 Conditional execution: if statements.
2. R Language Definition : 3.2 Control structures.
3. R for beginners by E
When you click search on the R homepage, type mosaic into the box, and
click the button, do the top 3 links seem relevant ?
Your previous 2 requests for help :
26 Feb : Response was SuppDists. Yet that is the first hit returned by the
subject line you posted : Hartleys table
22 Feb :
Nick,
Good question, but just sent to the wrong place. The posting guide asks you
to contact the package maintainer first before posting to r-help only if you
don't hear back. I guess one reason for that is that if questions about all
2000+ packages were sent to r-help, then r-help's traffic
The type of 'NA' is logical. So x[NA] behaves more like x[TRUE] i.e. silent
recycling.
class(NA)
[1] logical
x=101:108
x[NA]
[1] NA NA NA NA NA NA NA NA
x[c(TRUE,NA)]
[1] 101 NA 103 NA 105 NA 107 NA
x[as.integer(NA)]
[1] NA
HTH
Matthew
Barry Rowlingson b.rowling...@lancaster.ac.uk
Val,
Type combine two data sets (text you wrote in your post) into
www.rseek.org. The first two links are: Quick-R: Merge and Merging data:
A tutorial. Isn't it quicker for you to use rseek, rather than the time it
takes to write a post and wait for a reply ? Don't you also get more
M Joshi,
I don't know but I guess that some might have looked at your previous thread
on 14 March (also about the geoR package). You received help and good advice
then, but it doesn't appear that you are following it. It appears to be a
similar problem this time.
Also, this list is the wrong
Abraham,
This appears to be your 3rd unanswered post to r-help in March, all 3 have
been about the Zelig package.
Please read the posting guide and find out the correct place to send
questions about packages. Then you might get an answer.
HTH
Matthew
Mathew, Abraham T amat...@ku.edu wrote
You may not have got an answer because you posted to the wrong place. Its a
question about a package. Please read the posting guide.
miriza miri...@sfwmd.gov wrote in message
news:1269886286228-1695430.p...@n4.nabble.com...
Hi!
I am using geeglm to fit a Poisson model to a timeseries of
Contact the authors of those packages ?
miriza miri...@sfwmd.gov wrote in message
news:1269981675252-1745896.p...@n4.nabble.com...
Hi!
I was wondering if there were any packages that would allow me to fit a
GEE
to a single timeseries of counts so that I could account for
autocorrelation
Apparently not, since this your 3rd unanswered thread to r-help this month
about this package.
Please read the posting guide and find out where you should send questions
about packages. Then you might get an answer.
ping chen chen1984...@yahoo.com.cn wrote in message
Geelman,
This appears to be your first post to this list. Welcome to R. Nearly 2 days
is quite a long time to wait though, so you are unlikely to get a reply now.
Feedback : the question seems quite vague and imprecise. It depends on which
R you mean (32bit/64bit) and how much ram you have.
Rob,
Please look again at Romain's reply to you on 19th March. He informed you
then that Rcpp has its own dedicated mailing list and he gave you the link.
Matthew
R_help Help rhelp...@gmail.com wrote in message
news:ad1ead5f1003291753p68d6ed52q572940f13e1c0...@mail.gmail.com...
Hi,
I'm a
.
FWIW, I think the problem is fixed on the Rcpp 0.7.11 version (on cran
incoming)
Romain
Le 01/04/10 17:47, Matthew Dowle a écrit :
Rob,
Please look again at Romain's reply to you on 19th March. He informed you
then that Rcpp has its own dedicated mailing list and he gave you the
link
Ashley,
This appears to be your first post to this list. Welcome to R. Over 2 days
is quite a long time to wait though, so you are unlikely to get a reply now.
Feedback: since nlrq is in package quantreg, its a question about a package
and should
be sent to the package maintainer. Some
someone else on this list may be able to give you a ballpark estimate
of how much RAM this merge would require.
I don't have an absolute estimate, but try data.table::merge, as it needs
less
working memory than base::merge.
20 million rows of 5 columns isn't beyond 32bit :
(1*4 +
Please install v1.3 from R-forge :
install.packages(data.table,repos=http://R-Forge.R-project.org;)
It will be ready for CRAN soon.
Please follow up on datatable-h...@lists.r-forge.r-project.org
Matthew
bo bozha...@hotmail.com wrote in message
news:1270689586866-1755876.p...@n4.nabble.com...
Hi Dimitri,
A start has been made at explaining .SD in FAQ 2.1. This was previously on a
webpage, but its just been moved to a vignette :
https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/branch2/inst/doc/faq.pdf?rev=68root=datatable
Please note: that vignette is part of a
Users of package 'unknownR' already know simplify2array was added in R
2.13.0.
They also know what else was added. Do you?
http://unknownr.r-forge.r-project.org/
Joshua Wiley jwiley.ps...@gmail.com wrote in message
news:canz9z_j+trwoim3scayuaruors+8hyc30pmt_thiex6qmto...@mail.gmail.com...
To close this thread on-list :
packageVersion() was added to R in 2.12.0.
data.table's dependency on 2.12.0 is updated, thanks.
Matthew
Jesse Brown jesse.r.br...@lmco.com wrote in message
news:4e1b21a8.8090...@atl.lmco.com...
Matthew Dowle wrote:
Hi,
Try package 'data.table'. It has
Hi Justin,
In data.table 1.6.1 there was this news item :
oj's environment is now consistently reused so
that local variables may be set which persist
from group to group; e.g., incrementing a group
counter :
DT[,list(z,groupInd-groupInd+1),by=x]
One of
Try data.table:::sortedmatch, which is implemented in C.
It requires it's input to be sorted (and doesn't check)
Stavros Macrakis macra...@alum.mit.edu wrote in message
news:BANLkTi=j2lf5syxytv1dd4k9wr0zgk8...@mail.gmail.com...
Is there a generic binary search routine in a standard library
Peter,
If the proprietary part of REvolution's product is ok, then surely
Stanislav's suggestion is too. No?
Matthew
peter dalgaard pda...@gmail.com wrote in message
news:be157cf5-9b4b-45a0-a7d4-363b774f1...@gmail.com...
On Apr 7, 2011, at 09:45 , Stanislav Bek wrote:
Hi,
is it
murdoch.dun...@gmail.com wrote in message
news:4d9da9ff.9020...@gmail.com...
On 07/04/2011 7:47 AM, Matthew Dowle wrote:
Peter,
If the proprietary part of REvolution's product is ok, then surely
Stanislav's suggestion is too. No?
Revolution has said that they believe they follow the GPL
Do you know how many functions there are in base R?
How many of them do you know you don't know?
Run unk() to discover your unknown unknowns.
It's fast and it's fun!
unknownR v0.2 is now on CRAN.
More information is on the homepage :
http://unknownr.r-forge.r-project.org/
Or, just install the
data.table offers fast subset, fast grouping and fast ordered joins in a
short and flexible syntax, for faster development. It was first released
in August 2008 and is now the 3rd most popular package on Crantastic
with 20 votes and 7 reviews.
* X[Y] is a fast join for large data.
*
Adam,
because I did not have time to entirely test
Do you (or does your company) have an automated test suite in place?
R 2.10.0 is nearly two years old, and R 2.12.0 is nearly one.
Matthew
AdamMarczak adam.marc...@gmail.com wrote in message
news:1314385041626-3771731.p...@n4.nabble.com...
This is the fastest data.table way I can think of :
ans = mydt[,list(mytime=.N),by=list(id,mygroup)]
ans[,censor:=0L]
ans[J(unique(id)), censor:=1L, mult=last]
id mygroup mytime censor
[1,] 1 A 1 1
[2,] 2 B 3 0
[3,] 2 C 3 0
[4,] 2 D
Joshua Wiley jwiley.ps...@gmail.com wrote in message
news:canz9z_kopuwkzb-zxr96pvulhhf2znxntxso9xnyho-_jum...@mail.gmail.com...
On Tue, Oct 4, 2011 at 12:40 AM, Rainer Schuermann
rainer.schuerm...@gmx.net wrote:
Any comments are very welcome,
3. If that fails, and nobody else has a better
Assuming you can install other packages ok, data.table depends on
R =2.12.0. Which version of R do you have?
_If_ that's the problem, does anyone know if anything prevents
R's error message from stating which dependency isn't satisfied? I think
I've seen users confused by this before, for other
Ivo,
Also, perhaps FAQ 2.14 helps : Can you explain further why
data.table is inspired by A[B] syntax in base?
http://datatable.r-forge.r-project.org/datatable-faq.pdf
And, 2.15 and 2.16.
Matthew
Steve Lianoglou mailinglist.honey...@gmail.com wrote in message
Package plyr has .parallel.
Searching datatable-help for multicore, say on Nabble here,
http://r.789695.n4.nabble.com/datatable-help-f2315188.html
yields three relevant posts and examples.
Please check wiki do's and don'ts to make sure you didn't
fall into one of those traps, though (we don't
Using Josh's nice example, with data.table's built-in 'by' (optimised
grouping) yields a 6 times speedup (100 seconds down to 15 on
my netbook).
system.time(all.2b - lapply(si, function(.indx) { coef(lm(y ~
+ x, data=d[.indx,])) }))
user system elapsed
144.501 0.300 145.525
Hi Uwe,
When you cc from Nabble it doesn't show as cc'd on r-help. It's
a web form with an Email this post to... box. I asked Nabble
support (over a year ago) if they could reflect that in the cc field of
the post they send to r-help, with no luck.
The previous thread is cited automatically in
Hello Alex,
Assuming it was just an inadequate example (since a data.frame would suffice
in that case), did you know that a data.frames' columns do not have to be
vectors but can be lists? I don't know if that helps.
DF = data.frame(a=1:3)
DF$b = list(pi, 2:3, letters[1:5])
DF
a
Might Wayland fix it in Narwhal ?
Duncan Murdoch murdoch.dun...@gmail.com wrote in message
news:4cff7177.7030...@gmail.com...
On 08/12/2010 6:07 AM, Rainer M Krug wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 12/08/2010 12:05 PM, Duncan Murdoch wrote:
Rainer M Krug wrote:
Hi
if I understand
correctly.
Matthew
Duncan Murdoch murdoch.dun...@gmail.com wrote in message
news:4cffca13.7070...@gmail.com...
Matthew Dowle wrote:
Might Wayland fix it in Narwhal ?
I hope those names mean something to Rainer, because they mean nothing to
me.
Duncan Murdoch
Duncan
Try :
objects(package:base)
Also, as it happens, a new package called unknownR is in
development on R-Forge.
It's description says :
Do you know how many functions there are in base R?
How many of them do you know you don't know?
Run unk() to discover your unknown unknowns.
It's fast and
require(data.table)
DT = as.data.table(df)
# 1. Patients with ah and ihd
DT[,.SD[ah%in%diagnosis ihd%in%diagnosis],by=id]
id diagnosis
[1,] 2ah
[2,] 2 ihd
[3,] 2im
[4,] 4ah
[5,] 4 ihd
[6,] 4angina
# 2. Patients with ah but no ihd
Note that a key is not actually required, so it's even simpler syntax :
dX = as.data.table(X)
dX[,length(unique(z)),by=x,y]
x y V1
[1,] 1 1 2
[2,] 1 2 2
[3,] 2 3 2
[4,] 2 4 2
[5,] 3 5 2
[6,] 3 6 2
or passing list() syntax to the 'by' is exactly the same :
With data.table, the following is routine :
DT[order(a)] # ascending
DT[order(-a)] # descending, if a is numeric
DT[a5,sum(z),by=c][order(-V1)] # sum of z group by c, just where a5,
then show me the largest first
DT[order(-a,b)] # order by a descending then by b ascending, if a and b are
1 - 100 of 116 matches
Mail list logo