from:"Matthew Dowle"

Re: [R] Performance tuning tips when working with wide datasets

2010-11-24 Thread Matthew Dowle

Richard, Try data.table. See the introduction vignette and the presentations e.g. there is a slide showing a join to 183,000,000 observations of daily stock prices in 0.002 seconds. data.table has fast rolling joins (i.e. fast last observation carried forward) too. I see you asked about that on

Re: [R] Finding the nearest data in intraday data from two zoo objects

2010-11-24 Thread Matthew Dowle

Try data.table with the roll=TRUE argument. Set your keys and then write : futData[optData,roll=TRUE] That is fast and as you can see, short. Works on many millions and even billions of rows in R. Matthew http://datatable.r-forge.r-project.org/ Santosh Srinivas

Re: [R] long to wide on larger data set

2010-07-12 Thread Matthew Dowle

since you are on 64bit. I was working on the basis of squeezing into 32bit. Matthew Matthew Dowle mdo...@mdowle.plus.com wrote in message news:i1faj2$lv...@dough.gmane.org... Hi Juliet, Thanks for the info. It is very slow because of the == in testData[testData$V2==one_ind,] Why? Imagine

Re: [R] Finding points where two timeseries cross over

2010-08-04 Thread Matthew Dowle

Is this what you mean? x=c(1,2,2,3,4,5,6,3,2,1) y=c(2,3,4,2,1,2,3,4,5,6) matplot(cbind(x,y),type=l) which(diff(sign(x-y))!=0)+1 [1] 4 8 -- View this message in context: http://r.789695.n4.nabble.com/Finding-points-where-two-timeseries-cross-over-tp2313257p2313510.html Sent from the R help

Re: [R] coef(summary) and plyr

2010-08-09 Thread Matthew Dowle

Another option for consideration : library(data.table) mydt = as.data.table(mydf) mydt[,as.list(coef(lm(y~x1+x2+x3))),by=fac] fac X.Intercept. x1 x2x3 [1,] 0 -0.16247059 1.130220 2.988769 -19.14719 [2,] 1 0.08224509 1.216673 2.847960 -19.16105 [3,] 2

Re: [R] Pass By Value Questions

2010-08-20 Thread Matthew Dowle

To: r-help Cc: Jeff, Matt, Duncan, Hadley [ using Nabble to cc ] Jeff, Matt, How about the 'refdata' class in package ref. Also, Hadley's immutable data.frame in plyr 1.1. Both allow you to refer to subsets of a data.frame or matrix by reference I believe, if I understand correctly.

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle

All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few 'out of memory' errors and performance issues in R. data.table doesn't do that internally, and it's syntax is pretty

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle

+ep8ubu3mxxhhrd...@mail.gmail.com... On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few

Re: [R] Sorting and subsetting

2010-09-21 Thread Matthew Dowle

Wiley wrote: On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: All the solutions in this thread so far use the lapply(split(...)) paradigm either directly or indirectly. That paradigm doesn't scale. That's the likely source of quite a few 'out of memory' errors

Re: [R] sum specific rows in a data frame

2010-04-20 Thread Matthew Dowle

Or try data.table 1.4 on r-forge, its grouping is faster than aggregate : agg datatable X100.012 0.008 X100 0.020 0.008 X1000 0.172 0.020 X1 1.164 0.144 X1e.05 9.397 1.180 install.packages(data.table, repos=http://R-Forge.R-project.org;)

Re: [R] Using plyr::dply more (memory) efficiently?

2010-04-29 Thread Matthew Dowle

I don't know about that, but try this : install.packages(data.table, repos=http://R-Forge.R-project.org;) require(data.table) summaries = data.table(summaries) summaries[,sum(counts),by=symbol] Please let us know if that returns the correct result, and if its memory/speed is ok ? Matthew

Re: [R] Using plyr::dply more (memory) efficiently?

2010-04-29 Thread Matthew Dowle

Steve Lianoglou mailinglist.honey...@gmail.com wrote in message news:t2ybbdc7ed01004290812n433515b5vb15b49c170f5a...@mail.gmail.com... Thanks for directing me to the data.table package. I read through some of the vignettes, and it looks quite nice. While your sample code would provide

[R] [R-pkgs] data.table 1.4.1 now on CRAN

2010-05-07 Thread Matthew Dowle

data.table is an enhanced data.frame with fast subset, fast grouping and fast merge. It uses a short and flexible syntax which extends existing R concepts. Example: DT[a3,sum(b*c),by=d] where DT is a data.table with 4 columns (a,b,c,d). data.table 1.4.1 : * grouping is now 10+ times faster

Re: [R] Significant performance difference between split of adata.frame and split of vectors

2009-12-18 Thread Matthew Dowle

Thanks for suggesting data.table. It does have advantages in this example but it has to be used in a particular way. What does Peng actually want to achieve? I'll guess (but its only a guess) that he doesn't actually need to hold the entire table in memory in a split up format before doing

Re: [R] by function ??

2009-12-21 Thread Matthew Dowle

or if Dataset is a data.table : Dataset = data.table(Dataset) Dataset[,abs(ratio-median(ratio)),by=LEAID] LEAIDV1 [1,] 6307 0.0911905 [2,] 6307 0.0488095 [3,] 6307 0.0488095 [4,] 6307 0.1088095 [5,] 8300 0.2021538 [6,] 8300 0.000 [7,] 8300 0.060 rather than :

Re: [R] by function ??

2009-12-22 Thread Matthew Dowle

Maybe this (with enough data for a CI) ? : Dataset = data.table(Dataset) Dataset[,as.list(wilcox.test(ratio,conf.int=TRUE)$conf.int),by=LEAID] LEAID V1 V2 [1,] 6307 0.720 0.92 [2,] 8300 0.5678462 0.83 Warning messages: 1: In switch(alternative, two.sided = {

Re: [R] by function ??

2009-12-23 Thread Matthew Dowle

what I think is an estimated interval. I really want to use the above formula. I just can't figure out how to get it to run by the LEAID. It does require 9 observations to produce an interval, but I was showing a sample. Thanks again. L.A. Matthew Dowle-3 wrote: Maybe this (with enough

Re: [R] function in aggregate applied to specific columns only

2010-01-04 Thread Matthew Dowle

That makes eight solutions. Any others? :) A ninth was detailed in two other threads last month. The first link compares to ave(). http://tolstoy.newcastle.edu.au/R/e8/help/09/12/9014.html http://tolstoy.newcastle.edu.au/R/e8/help/09/12/8830.html Dennis Murphy djmu...@gmail.com wrote in

Re: [R] by function ??

2010-01-05 Thread Matthew Dowle

and more convenient (and therefore quicker) to write, debug and maintain. Matthew Dowle mdo...@mdowle.plus.com wrote in message news:hgnjev$3h...@ger.gmane.org... or if Dataset is a data.table : Dataset = data.table(Dataset) Dataset[,abs(ratio-median(ratio)),by=LEAID] LEAIDV1 [1

Re: [R] R matching lat/lon pairs from two datasets?

2010-01-05 Thread Matthew Dowle

Or if there is a requirement for speed or shorter more convenient syntax then there is a data.table join. Basically setkey(data1,V1,V2) and setkey(data2,V1,V2), then data1[data2] does the merge very quickly. You probably then want to do something with the merged data set, which you just add

Re: [R] mean for subset

2010-01-06 Thread Matthew Dowle

As can data.table (i.e. do 'having' in one statement) : DT = data.table(DF) DT[,list(n=length(NAME),mean(SCORE)),by=NAME][n==3] NAME n V2 [1,] James 3 64.0 [2,] Tom 3 78.7 but data.table isn't restricted to SQL functions (such as avg), any R functions can be used,

Re: [R] mean for subset

2010-01-07 Thread Matthew Dowle

special print control codes would mess things up. I just recently received a new laptop computer, and now I have an occassional problem with Word's pretty print quotes, but if you know about that problem, it is easy to fix. Jerry Floren Minnesota Department of Agriculture Matthew Dowle-3

Re: [R] lapply or data.table to find a unit's previous transaction

2010-06-03 Thread Matthew Dowle

William, Try a rolling join in data.table, something like this (untested) : setkey(Data, UnitID, TranDt)# sort by unit then date previous = transform(Data, TranDt=TranDt-1) Data[previous,roll=TRUE]# lookup the prevailing date before, if any, for each row within that row's UnitID

Re: [R] Performance enhancement for ave

2010-06-29 Thread Matthew Dowle

dt = data.table(d,key=grp1,grp2) system.time(ans1 - dt[ , list(mean(x),mean(y)) , by=list(grp1,grp2)]) user system elapsed 3.890.003.91# your 7.064 is 12.23 for me though, so this 3.9 should be faster for you However, Rprof() shows that 3.9 is mostly dispatch of mean to

Re: [R] Query about using timestamps returned by SQL as 'factor' forsplit

2010-07-09 Thread Matthew Dowle

Hi Ted, Well since you mentioned data.table (!) ... If risk_input is a data.table consisting of 3 columns (m_id, sale_date, return_date) where the dates are of class IDate (recently added to data.table by Tom) then try : risk_input[, fitdistr(return_date-sale_date,normal), by=list(m_id,

Re: [R] long to wide on larger data set

2010-07-12 Thread Matthew Dowle

Hi Juliet, Thanks for the info. It is very slow because of the == in testData[testData$V2==one_ind,] Why? Imagine someoone looks for 10 people in the phone directory. Would they search the entire phone directory for the first person's phone number, starting on page 1, looking at every single

Re: [R] Advantages of using SQLite for data import in comparison to csv files

2010-01-15 Thread Matthew Dowle

Just to comment on this bit : For one thing, you cannot index a csv file or a data.frame. If you have to repeatedly select subsets of your large data set, creating an index on the relevant column in the sqlite table is an absolute life saver. This is one reason the data.table package was

Re: [R] problem of data manipulation

2010-01-20 Thread Matthew Dowle

The user wrote in their first post : I have a lot of observations in my dataset Heres one way to do it with a data.table : a=data.table(a) ans = a[ , list(dt=dt[dt-min(dt)7]) , by=var1,var2,var3] class(ans$dt) = Date Timings are below comparing the 3 methods. In this

Re: [R] problem of data manipulation

2010-01-20 Thread Matthew Dowle

Sounds like a good idea. Would it be possible to give an example of how to combine plyr with data.table, and why that is better than a data.table only solution ? hadley wickham h.wick...@gmail.com wrote in message news:f8e6ff051001200624r2175e38xf558dc8fa3fb6...@mail.gmail.com... Note that in

Re: [R] problem of data manipulation

2010-01-20 Thread Matthew Dowle

... On Wed, Jan 20, 2010 at 8:43 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: Sounds like a good idea. Would it be possible to give an example of how to combine plyr with data.table, and why that is better than a data.table only solution ? Well, ideally, you'd do: adt - data.table

Re: [R] loop on list levels and names

2010-01-21 Thread Matthew Dowle

One way is : dataset = data.table(ssfamed) dataset[, whatever some functions are on Asfc, Smc, epLsar, etc , by=SPECSHOR,BONE] Your SPECSHOR and BONE names will be in your result alongside the results of the whatever ... Or try package plyr which does this sort of thing too. And sqldf may

Re: [R] Mutliple sets of data in one dataset....Need a loop?

2010-01-21 Thread Matthew Dowle

but I have thousands of results so it would be really hand to find away of doing this quickly its a little difficult to follow those examples Given your data in data.frame DF, maybe add the following to your list to investigate : dat = data.table(DF) dat[, cor(Score1,Score2),

Re: [R] Once again: Error: cannot allocate vector of size

2010-01-22 Thread Matthew Dowle

Please re-read the posting guide e.g. you didn't provide an example data set or a way to generate one, or any R version information. Werner W. pensterfuz...@yahoo.de wrote in message news:646146.32238...@web23002.mail.ird.yahoo.com... Hi, I have browsed the help list and looked at the FAQ

Re: [R] Merging and extracting data from list

2010-01-22 Thread Matthew Dowle

?merge plyr data.table sqldf crantastic Dr. Viviana Menzel vivianamen...@gmx.de wrote in message news:4b58a0e9.3050...@gmx.de... Hello R-help group, I have a question about merging lists. I have two lists: Genes list (hSgenes) namechrstrandstartendtransStarttransEnd

Re: [R] loop on list levels and names

2010-01-22 Thread Matthew Dowle

specific function), but don't worry I won't forget. As you said It only works if users contribute to it. That makes the power of R! Ivan Le 1/21/2010 19:01, Matthew Dowle a écrit : One way is : dataset = data.table(ssfamed) dataset[, whatever some functions are on Asfc, Smc, epLsar, etc

Re: [R] Once again: Error: cannot allocate vector of size

2010-01-22 Thread Matthew Dowle

Fantastic. You're much more likely to get a response now. Best of luck. werner w pensterfuz...@yahoo.de wrote in message news:1264175935970-1100164.p...@n4.nabble.com... Thanks Matthew, you are absolutely right. I am working on Windows XP SP2 32bit with R versions 2.9.1. Here is an

Re: [R] loop on list levels and names

2010-01-22 Thread Matthew Dowle

:18, Matthew Dowle a écrit : Great. If you mean the crantastic r package, sorry I wasn't clear, I meant the crantastic website http://crantastic.org/. If you meant the description of plyr then if the description looks useful then click the link taking you to the package documentation and read

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-27 Thread Matthew Dowle

:971536df1001270629w4795da89vb7d77af6e4e8b...@mail.gmail.com... On Wed, Jan 27, 2010 at 8:56 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How many columns, and of what type are the columns ? As Olga asked too, it would be useful to know more about what you're really trying to do. 3.5m rows is not actually

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle

should not be important as long as you can do what you want. SQL is declarative so you just specify what you want rather than how to get it and invisibly to the user it automatically draws up a query plan and then uses that plan to get the result. On Wed, Jan 27, 2010 at 12:48 PM, Matthew Dowle

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle

and use is to hide the implementation and focus on the problem. That is why we use high level languages, object orientation, etc. On Thu, Jan 28, 2010 at 4:37 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: How it represents data internally is very important, depending on the real goal : http

Re: [R] RMySQL - Bulk loading data and creating FK links

2010-01-28 Thread Matthew Dowle

its even faster. On Thu, Jan 28, 2010 at 8:52 AM, Matthew Dowle mdo...@mdowle.plus.com wrote: Are you claiming that SQL is that utopia? SQL is a row store. It cannot give the user the benefits of column store. For example, why does SQL take 113 seconds in the example in this thread : http

Re: [R] merging columns

2010-02-03 Thread Matthew Dowle

Yes. data.df[,wcol,drop=FALSE] For an explanation of drop see ?[.data.frame Chuck White chuckwhi...@charter.net wrote in message news:20100202212800.o8xbu.681696.r...@mp11... Additional clarification: the problem only comes when you have one column selected from the original dataframe. You

Re: [R] Reading large files

2010-02-05 Thread Matthew Dowle

I agree with Jim. The term do analysis is almost meaningless, the posting guide makes reference to statements such as that. At least he tried to define large, but inconsistenly (first of all 850MB, then changed to 10-20-15GB). Satish wrote: at one time I will need to load say 15GB into R

Re: [R] Reading large files

2010-02-05 Thread Matthew Dowle

I can't help you further than whats already been posted to you. Maybe someone else can. Best of luck. Satish Vadlamani satish.vadlam...@fritolay.com wrote in message news:1265397089104-1470667.p...@n4.nabble.com... Matthew: If it is going to help, here is the explanation. I have an end state

[R] Cannot find new threads: generic error when require(RODBC) with -d gdb

2008-03-31 Thread Matthew Dowle

Hi, We have the error below. Any ideas ? Regards, Matt $ R --vanilla -d gdb GNU gdb 6.7.1-debian Copyright (C) 2007 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute

Re: [R] Merge data frame and keep unmatched

2009-07-13 Thread Matthew Dowle

Or if you need it to be fast, try data.table. X[Y] is a join when X and Y are both data.tables. X[Y] is a left join, Y[X] is a right join. 'nomatch' controls the inner/outer join i.e. what happens for unmatched rows. This is much faster than merge(). Gabor Grothendieck

Re: [R] How to order an data.table by values of an column?

2009-07-14 Thread Matthew Dowle

If the question really meant to say data.table (i.e. package data.table) then its easier than the data.frame answer. dt = data.table(Categ=c(468,351,0,234,117),Perc=c(31.52,27.52,0.77,22.55,15.99)) dt[order(Categ)] Notice there is no dt$ required before dt$Categ. Also note the comma is

[R] [R-pkgs] data.table is on CRAN (enhanced data.frame for time series joins and more)

2009-03-31 Thread Matthew Dowle

Dear all, The data.table package was released back in August 2008. This email is to publicise its existence in response to several suggestions to do so. It seems I didn't send a general announcement about it at the time and therefore perhaps, not surprisingly, not many people know about it.

[R] Video demo of using svSocket with data.table

2009-08-20 Thread Matthew Dowle

Dear r-help, If you haven't already seen this then : http://www.youtube.com/watch?v=rvT8XThGA8o The video consists of typing at the console and graphics, there is no audio or slides. Please press the HD button and maximise. Its about 8 mins. Regards, Matthew

Re: [R] data.table evaluating columns

2010-03-03 Thread Matthew Dowle

I'd go a bit further and remind that the r-help posting guide is clear : For questions about functions in standard packages distributed with R (see the FAQ Add-on packages in R), ask questions on R-help. If the question relates to a contributed package , e.g., one downloaded from CRAN, try

Re: [R] data.table evaluating columns

2010-03-03 Thread Matthew Dowle

appear to be correct. Or just directly sending an email to all of you? Thanks again, Rob On Wed, Mar 3, 2010 at 6:05 AM, Matthew Dowle mdo...@mdowle.plus.comwrote: I'd go a bit further and remind that the r-help posting guide is clear : For questions about functions in standard packages

Re: [R] Three most useful R package

2010-03-03 Thread Matthew Dowle

Dieter, One way to check if a package is active, is by looking on r-forge. If you are referring to data.table you would have found it is actually very active at the moment and is far from abandoned. What you may be referring to is a warning, not an error, with v1.2 on R2.10+. That was fixed

Re: [R] ifthen() question

2010-03-05 Thread Matthew Dowle

This post breaks the posting guide in multiple ways. Please read it again (and then again) - in particular the first 3 paragraphs. You will help yourself by following it. The solution is right there in the help page for ?data.frame and other places including Introduction to R. I think its

Re: [R] Nonparametric generalization of ANOVA

2010-03-05 Thread Matthew Dowle

Frank, I respect your views but I agree with Gabor. The posting guide does not support your views. It is not any of our views that are important but we are following the posting guide. It covers affiliation. It says only that some consider it good manners to include a concise signature

Re: [R] Nonparametric generalization of ANOVA

2010-03-05 Thread Matthew Dowle

) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Matthew Dowle mdo...@mdowle.plus.com 3/5/2010 12:58 PM Frank, I respect your views but I agree with Gabor. The posting guide does not support your views. It is not any of our

Re: [R] fit a gamma pdf using Residual Sum-of-Squares

2010-03-08 Thread Matthew Dowle

Thanks for making it quickly reproducible - I was able to see that message in English within a few seconds. The start has x=86, but the data is also called x. Remove x=86 from start and you get a different error. P.S. - please do include the R version information. It saves time for us, and we

Re: [R] IMPORTANT - To remove the null elements from a vector

2010-03-09 Thread Matthew Dowle

Welcome to R Barbara. Its quite an incredible community from all walks of life. Your beginner questions are answered in the manual. See Introduction to R. Please read the posting guide again because it contains lots of good advice for you. Some people read it three times before posting

Re: [R] speed

2010-03-10 Thread Matthew Dowle

Your choice of subject line alone shows some people that you missed some small details from the posting guide. The ability to notice small details may be important for you to demonstrate in future. Any answer in this thread is unlikely to be found by a topic search on subject lines alone

Re: [R] Strange result in survey package: svyvar

2010-03-10 Thread Matthew Dowle

This list is the wrong place for that question. The posting guide tells you, in bold, to contact the package maintainer first. If you had already done that, and didn't hear back from him, then you should tell us, so that we know you followed the guide. Corey Sparks corey.spa...@utsa.edu

Re: [R] Forecasting with Panel Data

2010-03-11 Thread Matthew Dowle

Ricardo, I see you got no public answer so far, on either of the two lists you posted to at the same time yesterday. You are therefore unlikely to ever get a reply. I also see you've been having trouble getting answers in the past, back to Nov 09, at least. For example no reply to Credit

Re: [R] If else statements

2010-03-23 Thread Matthew Dowle

Here are some references. Please read these first and post again if you are still stuck after reading them. If you do post again, we will need x and y. 1. Introduction to R : 9.2.1 Conditional execution: if statements. 2. R Language Definition : 3.2 Control structures. 3. R for beginners by E

Re: [R] Mosaic

2010-03-24 Thread Matthew Dowle

When you click search on the R homepage, type mosaic into the box, and click the button, do the top 3 links seem relevant ? Your previous 2 requests for help : 26 Feb : Response was SuppDists. Yet that is the first hit returned by the subject line you posted : Hartleys table 22 Feb :

Re: [R] translating SQL statements into data.table operations

2010-03-25 Thread Matthew Dowle

Nick, Good question, but just sent to the wrong place. The posting guide asks you to contact the package maintainer first before posting to r-help only if you don't hear back. I guess one reason for that is that if questions about all 2000+ packages were sent to r-help, then r-help's traffic

Re: [R] NA values in indexing

2010-03-26 Thread Matthew Dowle

The type of 'NA' is logical. So x[NA] behaves more like x[TRUE] i.e. silent recycling. class(NA) [1] logical x=101:108 x[NA] [1] NA NA NA NA NA NA NA NA x[c(TRUE,NA)] [1] 101 NA 103 NA 105 NA 107 NA x[as.integer(NA)] [1] NA HTH Matthew Barry Rowlingson b.rowling...@lancaster.ac.uk

Re: [R] Combing

2010-03-29 Thread Matthew Dowle

Val, Type combine two data sets (text you wrote in your post) into www.rseek.org. The first two links are: Quick-R: Merge and Merging data: A tutorial. Isn't it quicker for you to use rseek, rather than the time it takes to write a post and wait for a reply ? Don't you also get more

Re: [R] Error grid must have equal distances in each direction

2010-03-31 Thread Matthew Dowle

M Joshi, I don't know but I guess that some might have looked at your previous thread on 14 March (also about the geoR package). You received help and good advice then, but it doesn't appear that you are following it. It appears to be a similar problem this time. Also, this list is the wrong

Re: [R] Question about 'logit' and 'mlogit' in Zelig

2010-03-31 Thread Matthew Dowle

Abraham, This appears to be your 3rd unanswered post to r-help in March, all 3 have been about the Zelig package. Please read the posting guide and find out the correct place to send questions about packages. Then you might get an answer. HTH Matthew Mathew, Abraham T amat...@ku.edu wrote

Re: [R] zero standard errors with geeglm in geepack

2010-03-31 Thread Matthew Dowle

You may not have got an answer because you posted to the wrong place. Its a question about a package. Please read the posting guide. miriza miri...@sfwmd.gov wrote in message news:1269886286228-1695430.p...@n4.nabble.com... Hi! I am using geeglm to fit a Poisson model to a timeseries of

Re: [R] GEE for a timeseries of count (one cluster)

2010-03-31 Thread Matthew Dowle

Contact the authors of those packages ? miriza miri...@sfwmd.gov wrote in message news:1269981675252-1745896.p...@n4.nabble.com... Hi! I was wondering if there were any packages that would allow me to fit a GEE to a single timeseries of counts so that I could account for autocorrelation

Re: [R] mcmcglmm starting value example

2010-03-31 Thread Matthew Dowle

Apparently not, since this your 3rd unanswered thread to r-help this month about this package. Please read the posting guide and find out where you should send questions about packages. Then you might get an answer. ping chen chen1984...@yahoo.com.cn wrote in message

Re: [R] GLM / large dataset question

2010-03-31 Thread Matthew Dowle

Geelman, This appears to be your first post to this list. Welcome to R. Nearly 2 days is quite a long time to wait though, so you are unlikely to get a reply now. Feedback : the question seems quite vague and imprecise. It depends on which R you mean (32bit/64bit) and how much ram you have.

Re: [R] Adding RcppFrame to RcppResultSet causes segmentation fault

2010-04-01 Thread Matthew Dowle

Rob, Please look again at Romain's reply to you on 19th March. He informed you then that Rcpp has its own dedicated mailing list and he gave you the link. Matthew R_help Help rhelp...@gmail.com wrote in message news:ad1ead5f1003291753p68d6ed52q572940f13e1c0...@mail.gmail.com... Hi, I'm a

Re: [R] Adding RcppFrame to RcppResultSet causes segmentation fault

2010-04-01 Thread Matthew Dowle

. FWIW, I think the problem is fixed on the Rcpp 0.7.11 version (on cran incoming) Romain Le 01/04/10 17:47, Matthew Dowle a écrit : Rob, Please look again at Romain's reply to you on 19th March. He informed you then that Rcpp has its own dedicated mailing list and he gave you the link

Re: [R] nlrq parameter bounds

2010-04-01 Thread Matthew Dowle

Ashley, This appears to be your first post to this list. Welcome to R. Over 2 days is quite a long time to wait though, so you are unlikely to get a reply now. Feedback: since nlrq is in package quantreg, its a question about a package and should be sent to the package maintainer. Some

Re: [R] memory error

2010-04-06 Thread Matthew Dowle

someone else on this list may be able to give you a ballpark estimate of how much RAM this merge would require. I don't have an absolute estimate, but try data.table::merge, as it needs less working memory than base::merge. 20 million rows of 5 columns isn't beyond 32bit : (1*4 +

Re: [R] match function or ==

2010-04-08 Thread Matthew Dowle

Please install v1.3 from R-forge : install.packages(data.table,repos=http://R-Forge.R-project.org;) It will be ready for CRAN soon. Please follow up on datatable-h...@lists.r-forge.r-project.org Matthew bo bozha...@hotmail.com wrote in message news:1270689586866-1755876.p...@n4.nabble.com...

Re: [R] Code is too slow: mean-centering variables in a dataframebysubgroup

2010-04-08 Thread Matthew Dowle

Hi Dimitri, A start has been made at explaining .SD in FAQ 2.1. This was previously on a webpage, but its just been moved to a vignette : https://r-forge.r-project.org/plugins/scmsvn/viewcvs.php/*checkout*/branch2/inst/doc/faq.pdf?rev=68root=datatable Please note: that vignette is part of a

Re: [R] manipulating by lists and ave() functions

2011-07-11 Thread Matthew Dowle

Users of package 'unknownR' already know simplify2array was added in R 2.13.0. They also know what else was added. Do you? http://unknownr.r-forge.r-project.org/ Joshua Wiley jwiley.ps...@gmail.com wrote in message news:canz9z_j+trwoim3scayuaruors+8hyc30pmt_thiex6qmto...@mail.gmail.com...

Re: [R] EXTERNAL: Re: subset with aggregate key

2011-07-13 Thread Matthew Dowle

To close this thread on-list : packageVersion() was added to R in 2.12.0. data.table's dependency on 2.12.0 is updated, thanks. Matthew Jesse Brown jesse.r.br...@lmco.com wrote in message news:4e1b21a8.8090...@atl.lmco.com... Matthew Dowle wrote: Hi, Try package 'data.table'. It has

Re: [R] Sequential Naming of ggplot .pngs using plyr

2011-08-11 Thread Matthew Dowle

Hi Justin, In data.table 1.6.1 there was this news item : oj's environment is now consistently reused so that local variables may be set which persist from group to group; e.g., incrementing a group counter : DT[,list(z,groupInd-groupInd+1),by=x] One of

Re: [R] General binary search?

2011-04-05 Thread Matthew Dowle

Try data.table:::sortedmatch, which is implemented in C. It requires it's input to be sorted (and doesn't check) Stavros Macrakis macra...@alum.mit.edu wrote in message news:BANLkTi=j2lf5syxytv1dd4k9wr0zgk8...@mail.gmail.com... Is there a generic binary search routine in a standard library

Re: [R] R licence

2011-04-07 Thread Matthew Dowle

Peter, If the proprietary part of REvolution's product is ok, then surely Stanislav's suggestion is too. No? Matthew peter dalgaard pda...@gmail.com wrote in message news:be157cf5-9b4b-45a0-a7d4-363b774f1...@gmail.com... On Apr 7, 2011, at 09:45 , Stanislav Bek wrote: Hi, is it

Re: [R] R licence

2011-04-07 Thread Matthew Dowle

murdoch.dun...@gmail.com wrote in message news:4d9da9ff.9020...@gmail.com... On 07/04/2011 7:47 AM, Matthew Dowle wrote: Peter, If the proprietary part of REvolution's product is ok, then surely Stanislav's suggestion is too. No? Revolution has said that they believe they follow the GPL

[R] [R-pkgs] unknownR : you didn't know you didn't know?

2011-04-28 Thread Matthew Dowle

Do you know how many functions there are in base R? How many of them do you know you don't know? Run unk() to discover your unknown unknowns. It's fast and it's fun! unknownR v0.2 is now on CRAN. More information is on the homepage : http://unknownr.r-forge.r-project.org/ Or, just install the

[R] [R-pkgs] data.table 1.6 is now on CRAN

2011-04-28 Thread Matthew Dowle

data.table offers fast subset, fast grouping and fast ordered joins in a short and flexible syntax, for faster development. It was first released in August 2008 and is now the 3rd most popular package on Crantastic with 20 votes and 7 reviews. * X[Y] is a fast join for large data. *

Re: [R] ddply from plyr package - any alternatives?

2011-08-30 Thread Matthew Dowle

Adam, because I did not have time to entirely test Do you (or does your company) have an automated test suite in place? R 2.10.0 is nearly two years old, and R 2.12.0 is nearly one. Matthew AdamMarczak adam.marc...@gmail.com wrote in message news:1314385041626-3771731.p...@n4.nabble.com...

Re: [R] formatting a 6 million row data set; creating a censoring variable

2011-09-01 Thread Matthew Dowle

This is the fastest data.table way I can think of : ans = mydt[,list(mytime=.N),by=list(id,mygroup)] ans[,censor:=0L] ans[J(unique(id)), censor:=1L, mult=last] id mygroup mytime censor [1,] 1 A 1 1 [2,] 2 B 3 0 [3,] 2 C 3 0 [4,] 2 D

Re: [R] Efficient way to do a merge in R

2011-10-04 Thread Matthew Dowle

Joshua Wiley jwiley.ps...@gmail.com wrote in message news:canz9z_kopuwkzb-zxr96pvulhhf2znxntxso9xnyho-_jum...@mail.gmail.com... On Tue, Oct 4, 2011 at 12:40 AM, Rainer Schuermann rainer.schuerm...@gmx.net wrote: Any comments are very welcome, 3. If that fails, and nobody else has a better

Re: [R] cannot install.packages(data.table)

2011-10-04 Thread Matthew Dowle

Assuming you can install other packages ok, data.table depends on R =2.12.0. Which version of R do you have? _If_ that's the problem, does anyone know if anything prevents R's error message from stating which dependency isn't satisfied? I think I've seen users confused by this before, for other

Re: [R] fast or space-efficient lookup?

2011-10-10 Thread Matthew Dowle

Ivo, Also, perhaps FAQ 2.14 helps : Can you explain further why data.table is inspired by A[B] syntax in base? http://datatable.r-forge.r-project.org/datatable-faq.pdf And, 2.15 and 2.16. Matthew Steve Lianoglou mailinglist.honey...@gmail.com wrote in message

Re: [R] multicore by(), like mclapply?

2011-10-10 Thread Matthew Dowle

Package plyr has .parallel. Searching datatable-help for multicore, say on Nabble here, http://r.789695.n4.nabble.com/datatable-help-f2315188.html yields three relevant posts and examples. Please check wiki do's and don'ts to make sure you didn't fall into one of those traps, though (we don't

Re: [R] SLOW split() function

2011-10-13 Thread Matthew Dowle

Using Josh's nice example, with data.table's built-in 'by' (optimised grouping) yields a 6 times speedup (100 seconds down to 15 on my netbook). system.time(all.2b - lapply(si, function(.indx) { coef(lm(y ~ + x, data=d[.indx,])) })) user system elapsed 144.501 0.300 145.525

Re: [R] How to map current Europe?

2011-10-13 Thread Matthew Dowle

Hi Uwe, When you cc from Nabble it doesn't show as cc'd on r-help. It's a web form with an Email this post to... box. I asked Nabble support (over a year ago) if they could reflect that in the cc field of the post they send to r-help, with no luck. The previous thread is cited automatically in

Re: [R] fast subsetting of lists in lists

2010-12-07 Thread Matthew Dowle

Hello Alex, Assuming it was just an inadequate example (since a data.frame would suffice in that case), did you know that a data.frames' columns do not have to be vectors but can be lists? I don't know if that helps. DF = data.frame(a=1:3) DF$b = list(pi, 2:3, letters[1:5]) DF a

Re: [R] RGL crashes

2010-12-08 Thread Matthew Dowle

Might Wayland fix it in Narwhal ? Duncan Murdoch murdoch.dun...@gmail.com wrote in message news:4cff7177.7030...@gmail.com... On 08/12/2010 6:07 AM, Rainer M Krug wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/08/2010 12:05 PM, Duncan Murdoch wrote: Rainer M Krug wrote: Hi

Re: [R] RGL crashes

2010-12-09 Thread Matthew Dowle

if I understand correctly. Matthew Duncan Murdoch murdoch.dun...@gmail.com wrote in message news:4cffca13.7070...@gmail.com... Matthew Dowle wrote: Might Wayland fix it in Narwhal ? I hope those names mean something to Rainer, because they mean nothing to me. Duncan Murdoch Duncan

Re: [R] Listing of available functions

2011-01-04 Thread Matthew Dowle

Try : objects(package:base) Also, as it happens, a new package called unknownR is in development on R-Forge. It's description says : Do you know how many functions there are in base R? How many of them do you know you don't know? Run unk() to discover your unknown unknowns. It's fast and

Re: [R] subsets

2011-01-23 Thread Matthew Dowle

require(data.table) DT = as.data.table(df) # 1. Patients with ah and ihd DT[,.SD[ah%in%diagnosis ihd%in%diagnosis],by=id] id diagnosis [1,] 2ah [2,] 2 ihd [3,] 2im [4,] 4ah [5,] 4 ihd [6,] 4angina # 2. Patients with ah but no ihd

Re: [R] Counting number of rows with two criteria in dataframe

2011-01-26 Thread Matthew Dowle

Note that a key is not actually required, so it's even simpler syntax : dX = as.data.table(X) dX[,length(unique(z)),by=x,y] x y V1 [1,] 1 1 2 [2,] 1 2 2 [3,] 2 3 2 [4,] 2 4 2 [5,] 3 5 2 [6,] 3 6 2 or passing list() syntax to the 'by' is exactly the same :

Re: [R] Simple order() data frame question.

2011-05-12 Thread Matthew Dowle

With data.table, the following is routine : DT[order(a)] # ascending DT[order(-a)] # descending, if a is numeric DT[a5,sum(z),by=c][order(-V1)] # sum of z group by c, just where a5, then show me the largest first DT[order(-a,b)] # order by a descending then by b ascending, if a and b are

1 2 >

1 - 100 of 116 matches

Mail list logo