Re: [R] reading large csv data sets efficiently

2013-05-22 Thread Whit Armstrong
http://cran.r-project.org/web/packages/data.table/index.html On Wed, May 22, 2013 at 12:31 PM, ivo welch ivo.we...@anderson.ucla.eduwrote: I have a couple of large data sets, on the order of 4GB. they come in .csv files, with about 50 columns and lots of rows. a couple have weird NA

Re: [R] Running a R file at a specific time each day

2013-02-11 Thread Whit Armstrong
man cron or something more robust: http://jenkins-ci.org/ -Whit On Mon, Feb 11, 2013 at 1:51 PM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Hello again, My query may look quite generic, however at this point of time I just want explain my problem. I am hopeful that somebody can

Re: [R] Running MCMC in R

2012-12-13 Thread Whit Armstrong
Why don't you use one of the existing MCMC packages. There are many to choose from... On Wed, Dec 12, 2012 at 10:49 PM, Chenyi Pan cp...@virginia.edu wrote: Dear all I am now running a MCMC iteration in the R program. But it is always stucked in some loop. This cause big problems for my

Re: [R] Parallel R

2012-09-14 Thread Whit Armstrong
I addition to Michael's suggestions, you can also check out this tutorial which shows how to use lapply into EC2. http://www.rinfinance.com/agenda/2012/workshop/WhitArmstrong.pdf Unfortunately, rzmq is not available on windows, so this may not be the best solution for your setup. -Whit On

Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread Whit Armstrong
I don't work for Amazon, but here is one of their promo pieces on using 'spot' instances: http://youtu.be/WD9N73F3Fao at about 2:15, they cite University of Melbourne and Universitat de Barcelona as customers... My interest in all this cloud talk is that I'll be presenting a tutorial on R in the

Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Whit Armstrong
You should think about the cloud as a serious alternative. I completely agree with Barry. Unless you will utilize your machines (and by utilize, I mean 100% cpu usage) all the time (including weekends) you will probably better use your funds to purchase blocks of machines when you need to run

[R] deep copy?

2012-04-13 Thread Whit Armstrong
Is putting a variable into a list a deep copy (and is tracemem the correct way to confirm)? warmstrong@krypton:~/dvl/R.packages$ R x - rnorm(1000) tracemem(x) [1] 0x3214c90 x.list - list(x.in.list=x) tracemem[0x3214c90 - 0x2af0a20]: Is it possible to put a variable into a list without

Re: [R] deferred call

2012-04-12 Thread Whit Armstrong
it. Thanks, Whit On Thu, Apr 12, 2012 at 1:00 AM, Bert Gunter gunter.ber...@gene.com wrote: On Wed, Apr 11, 2012 at 8:12 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Wed, Apr 11, 2012 at 10:17 PM, Whit Armstrong armstrong.w...@gmail.com wrote: I must admit I'm a little ashamed to have

[R] deferred call

2012-04-11 Thread Whit Armstrong
I must admit I'm a little ashamed to have been using R for so long, and still lack a sound understanding of deferred calls, eval, deparse, substitute, and friends. I'm attempting to make a deferred call to a function which has default arguments in the following way: call.foo - function(f) {

Re: [R] R on the cloud - Windows to Linux

2011-12-20 Thread Whit Armstrong
/2.13/rzmqâ The downloaded packages are in     â/tmp/RtmpoTdDMm/downloaded_packagesâ Warning message: In install.packages(rzmq, dependencies = TRUE) :   installation of package 'rzmq' had non-zero exit status Thank you for your help! Ben On Wed, Dec 7, 2011 at 7:00 PM, Whit

Re: [R] R on the cloud - Windows to Linux

2011-12-08 Thread Whit Armstrong
I don't know where to start because, it looks like rzmq is not available for Windows and it looks like AWS.tools and deathstar depend on rzmq, so by Hence my reference to work. patches welcome. Will using a local Windows box continue to be an issue as I progress with R and EC2? I've run

Re: [R] R on the cloud - Windows to Linux

2011-12-07 Thread Whit Armstrong
subscribe to R-hpc. and check out these: https://github.com/armstrtw/rzmq https://github.com/armstrtw/AWS.tools https://github.com/armstrtw/deathstar and this: http://code.google.com/p/segue/ If you're willing to work, you can probably get deathstar to work using a local windows box and remote

Re: [R] RPostgreSQL snowfall

2011-06-06 Thread Whit Armstrong
I don't think you can share dbi connections across different instances of R. just have each of your helper functions open a local connection. or alternatively, load a package on each instance which keeps a dbi connection open. and make sure you bump up your allowed number of connections in

Re: [R] Is there a better way to parse strings than this?

2011-04-14 Thread Whit Armstrong
not everything has to be done in R. awk and sed are some of the best tools on a linux/unix box. quick refs: http://www.pement.org/awk/awk1line.txt http://sed.sourceforge.net/sed1line.txt -Whit On Wed, Apr 13, 2011 at 12:07 AM, Chris Howden ch...@trickysolutions.com.au wrote: Hi Everyone,

Re: [R] Edate and EOmonth

2011-04-11 Thread Whit Armstrong
I think Dirk has recently done some things w/ boost date time as an Rcpp based project bdt. http://cran.r-project.org/web/packages/RcppBDT/ChangeLog -Whit On Mon, Apr 11, 2011 at 10:11 AM, Jorge Nieves jorge.nie...@moorecap.comwrote: Hi, I was wondering if anyone could point me to the

Re: [R] JAGS/BUGS on gene expression data

2011-03-14 Thread Whit Armstrong
There are better alternatives for big data than to revert to C. http://code.google.com/p/pymc/ http://github.com/armstrtw/CppBugs (still alpha) -Whit On Mon, Mar 14, 2011 at 11:06 AM, nblarson nblar...@gmail.com wrote: Has anybody had issues running MCMC (either BUGS or JAGS) on data sets of

Re: [R] Efficient way to use data frame of indices to initialize matrix

2010-12-07 Thread Whit Armstrong
index m as a vector and do the assignment in one step i - df$row + (df$col-1)*nrow(m) m[i] - df$a or something along those lines. -Whit On Tue, Dec 7, 2010 at 1:31 PM, Cutler, Gene gcut...@amgen.com wrote: I have a data frame with three columns, x, y, and a.  I want to create a matrix from

Re: [R] Job scheduling in R

2010-11-19 Thread Whit Armstrong
http://hudson-ci.org/ give hudson a try. It's incredibly easy to set up, and handles job dependencies and notifications for job failures. Its suggested use case is for automated software builds, but it fits the role of scheduled jobs (and interjob dependencies) very well. -Whit On Fri, Nov

Re: [R] Which version control system to learn for managing Rprojects?

2010-10-27 Thread Whit Armstrong
http://hudson-ci.org On Wed, Oct 27, 2010 at 8:49 AM, david.jes...@ubs.com wrote: Gabor As someone trying to the rest of my team using Subversion (which I have used for a while, but more as a backup / record of changes), have you a neat / automated way of building a package from a

Re: [R] Markov Switching with TVTP - problems with convergence

2010-10-26 Thread Whit Armstrong
I've looked at the Kim/Nelson gauss code before, and I applaud your effort to convert it to R. I'm happy to have a look at it for you if you are willing to share your example. -Whit On Tue, Oct 26, 2010 at 4:13 AM, Houge jb.ho...@gmail.com wrote: Greetings fellow R entusiasts! We have some

Re: [R] Which version control system to learn for managing R projects?

2010-10-26 Thread Whit Armstrong
Marc is exactly right about people having strong opinions. R-forge is really the _only_ reason to consider using svn. git is where the world is headed. This video is a little old: http://www.youtube.com/watch?v=4XpnKHJAok8, but does a good job getting the point across. Hg is a good

Re: [R] R with CouchDB?

2010-08-16 Thread Whit Armstrong
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:nosql_interface http://github.com/wactbprot/R4CouchDB On Mon, Aug 16, 2010 at 7:40 AM, David Mitchell monch1...@gmail.com wrote: Hello all, I'm kind of surprised that searching the archives and Googling haven't given me a

Re: [R] export tables to excel files on multiple sheets with titles for each table

2010-07-14 Thread Whit Armstrong
It isn't beautiful, but I use this package to write excel files from linux. http://github.com/armstrtw/Rexcelpoi the basic idea is that each element of a list is written as a separate sheet, but if a list element is itself a list, then all the elements of that list are written to the same sheet

Re: [R] Need help in handling date

2010-07-07 Thread Whit Armstrong
?strptime ‘%B’ Full month name in the current locale. (Also matches abbreviated name on input.) On Wed, Jul 7, 2010 at 8:40 AM, Christofer Bogaso bogaso.christo...@gmail.com wrote: Dear all, I have a date related question. Suppose I have a character string March-2009, how I can

Re: [R] Ways to work with R and Postgres

2010-06-27 Thread Whit Armstrong
http://github.com/armstrtw/unifieddbi which I use on 64bit linux. you are welcome to test it for 64 bit windows. are you able to compile yourself? or do you need a packaged version? -Whit 2010/6/27 顾小波 guxiaobo1...@gmail.com: Hi, I post this message to the general r-help list hoping

Re: [R] efficient rolling rank

2010-04-17 Thread Whit Armstrong
library(fts) x - fts(data=rnorm(1e6)) system.time(xrnk - moving.rank(x,500)) user system elapsed 0.680.000.68 you will have to disguise your data as a time series to use fts. see below the exact implementation of rank that is used. -Whit templatetypename ReturnType class

Re: [R] for help on building a R package with several R function and a bunch of c, c++

2010-03-05 Thread Whit Armstrong
Pick up Rcpp, make your life easier. http://dirk.eddelbuettel.com/code/rcpp.html -Whit On Fri, Mar 5, 2010 at 9:19 AM, alex46...@yahoo.com wrote: Hope I can get quick help from here, I have a bunch of c, c++ included main function and makefile. It works well on both UNIX and windows. I

Re: [R] Variable Combinations in Regression

2010-01-08 Thread Whit Armstrong
?expand.grid On Fri, Jan 8, 2010 at 3:26 PM, Richardson, Patrick patrick.richard...@vai.org wrote: Let's say I have 8 variables and I want to generate all combinations of those variables (In pairs, threes fours, etc) to run in multiple linear regression. Is there a built-in function to do

Re: [R] python

2009-11-21 Thread Whit Armstrong
We have been using pymc as an alternative to WinBUGS, and have been very pleased with it. I've begun working on an R2Pymc package, but don't have anything ready for sharing yet. Here's the pymc page: http://code.google.com/p/pymc/ and the repo is here: http://github.com/pymc-devs/pymc I've

Re: [R] how to transform m/d/yyyy to yyyymmdd?

2009-07-21 Thread Whit Armstrong
warmstr...@research:~$ R strptime(12/9/2007,%m/%d/%Y) [1] 2007-12-09 format(strptime(12/9/2007,%m/%d/%Y),%Y%m%d) [1] 20071209 On Tue, Jul 21, 2009 at 1:16 PM, liujbliujul...@yahoo.com wrote: Hello, I have a set of data that has a Date column looks like this: 12/9/2007 12/16/2007

[R] functions to calculate t-stats, etc. for lm.fit objects?

2009-07-08 Thread Whit Armstrong
I'm running a huge number of regressions in a loop, so I tried lm.fit for a speedup. However, I would like to be able to calculate the t-stats for the coefficients. Does anyone have some functions for calculating the regression summary stats of an lm.fit object? Thanks, Whit

Re: [R] functions to calculate t-stats, etc. for lm.fit objects?

2009-07-08 Thread Whit Armstrong
Marc, Thanks very much for your detailed reply. I'll give your code a try and post back the time difference. Cheers, Whit On Wed, Jul 8, 2009 at 10:50 AM, Marc Schwartzmarc_schwa...@me.com wrote: On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote: I'm running a huge number of regressions

Re: [R] Testing memory limits in R??

2009-07-07 Thread Whit Armstrong
Seems strange. I can go all the way up to 50GB on our machine which has 64GB as well. It starts swapping after that, so I killed the process. try this: ans - list() for(i in 1:100) { ans[[ i ]] - numeric(2^30/2) cat(iteration: ,i,\n) print(gc()) } source(scripts/test.memory.r)

Re: [R] Where can I find information on how to subsample a time series?

2009-06-26 Thread Whit Armstrong
assuming you pull the data you want into x and y: w...@ubuntu:~$ R library(fts) x - fts() y - fts() xy.cor.200 - moving.cor(x,y,200) tail(xy.cor.200) [,1] 2012-03-12 -0.3009635 2012-03-13 -0.2923489 2012-03-14 -0.2824015 2012-03-15 -0.2662689 2012-03-16 -0.2566354 2012-03-17

Re: [R] How to call time series functions from C ?

2009-05-07 Thread Whit Armstrong
you have a couple of options. If you require specific R functions to do what you want, then you will need to call R from C. I believe that Dirk has been working on an RInside package that does this. Alternatively, you can use my tslib package, which is a general time series library written in

Re: [R] Returning Variables in R to Linux Shell

2009-04-22 Thread Whit Armstrong
try littler: warmstr...@linuxsvr2:/tmp$ export MYVALUE=`r -e 'cat(10)'` warmstr...@linuxsvr2:/tmp$ env|grep MYVALUE MYVALUE=10 warmstr...@linuxsvr2:/tmp$ On Wed, Apr 22, 2009 at 10:48 AM, Bierbryer, Andrew abierbr...@klsdiversified.com wrote: If I have an R script that I am executing from a

Re: [R] Re ading from a Database

2009-03-18 Thread Whit Armstrong
can you show the list a more specific example of what you are trying to do? most of the database packages support writeTable commands. So, if you can represent the data you are trying to write in a dataframe, then you can probably send it to the database with R. -Whit On Wed, Mar 18, 2009 at

Re: [R] Creating an Excel file with multiple spreadsheets

2009-03-09 Thread Whit Armstrong
if you don't find the solution you need, I have a package that uses Apache POI to do this, but you will need to compile it yourself. contact me if you want to go this route. -Whit On Mon, Mar 9, 2009 at 3:34 PM, Patrick Connolly p_conno...@slingshot.co.nz wrote: On Mon, 09-Mar-2009 at 02:34PM

Re: [R] Dates in Common

2009-01-23 Thread Whit Armstrong
you want: ans - intersect(data1,data2) class(ans) - c(POSIXt,POSIXct) I personally think intersect should preserve the class of the object (if both args have the same class), but I think r-core has a different opinion. -Whit On Fri, Jan 23, 2009 at 9:02 AM, Tom La Bone boo...@gforcecable.com

Re: [R] Ordered Multidimensional Arrays

2008-12-23 Thread Whit Armstrong
I take a similar approach by storing my vcv's in a list w/ the date stored as a character vector %y-%m-%d as the list names. That way you can easily grab the vcv you need by casting your date to a string and using it to index the list. not sure if that will work for you. hth, Whit On Tue, Dec

Re: [R] newbie question on snow/Rmpi

2008-12-19 Thread Whit Armstrong
, 2008 at 01:16:46PM -0500, Whit Armstrong wrote: I have a network of four machines set up. I'm having trouble spawning my slaves on these machines. All the examples I have found so far use makeCluster with type=MPI, and I guess I'm missing some kind of cluster configuration in my environment

Re: [R] newbie question on snow/Rmpi

2008-12-19 Thread Whit Armstrong
, On 19 December 2008 at 10:17, Whit Armstrong wrote: | Does anyone know if these errors can be safely ignored? | | [linuxsvr.kls.corp:16242] mca: base: component_find: unable to open | osc pt2pt: file not found (ignored) | | this is on RHEL5 w/ openMPI 1.2.7 Yes. Hao (of Rmpi fame) and I

Re: [R] sliding window over a large vector

2008-12-16 Thread Whit Armstrong
if you want the speed, you can simply build an fts time series from it, then apply the moving.sum function and throw away the dates. this will probably be the fastest implementation of rolling applies out there unless you do a cumsum difference function. I got a sample timing of 2 seconds on 12m

[R] newbie question on snow/Rmpi

2008-12-15 Thread Whit Armstrong
I have a network of four machines set up. I'm having trouble spawning my slaves on these machines. All the examples I have found so far use makeCluster with type=MPI, and I guess I'm missing some kind of cluster configuration in my environment variables because all my clusters are formed on the

[R] is there a way to recursilvely lapply

2008-12-11 Thread Whit Armstrong
for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering if I have overlooked something obvious. one can also do:

Re: [R] is there a way to recursilvely lapply

2008-12-11 Thread Whit Armstrong
...@stats.ox.ac.uk wrote: On Thu, 11 Dec 2008, Whit Armstrong wrote: for a simple example: x - list() x[[a]] - list(a=c(1,2,3),b=c(3,4,5)) x[[b]] - list(a=c(6,7,8),b=c(9,10,11)) lapply(x,sum) this fails w/ Error in FUN(X[[1L]], ...) : invalid 'type' (list) of argument Just wondering

Re: [R] is there a way to recursilvely lapply

2008-12-11 Thread Whit Armstrong
yes, that is correct. I was looking in text mode. ok, thanks for your help. -Whit On Thu, Dec 11, 2008 at 4:02 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On Thu, 11 Dec 2008, Whit Armstrong wrote: Thanks, Gabor and Prof. Ripley. Sorry for the oversight. I grepped the lapply help

Re: [R] RODBC - problems connecting to oracle through linux

2008-12-01 Thread Whit Armstrong
I've had a good experience with the ROracle driver. Any reason why you need RODBC? -Whit On Mon, Dec 1, 2008 at 10:17 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 28 Nov 2008, Simon Collins wrote: Hi I'm presently trying to connect to Oracle through RODBC / UnixODBC on linux

[R] builtin to filter a list?

2008-10-29 Thread Whit Armstrong
I know it's easy to write a simple loop to do this, but in the spirit of lapply, I thought I would ask if there is a builtin to filter or take a subset of a list based on a predicate in a similar way to the Erlang lists:filter/2 function: http://www.erlang.org/doc/man/lists.html#filter-2

Re: [R] builtin to filter a list?

2008-10-29 Thread Whit Armstrong
, Whit Armstrong [EMAIL PROTECTED] wrote: I know it's easy to write a simple loop to do this, but in the spirit of lapply, I thought I would ask if there is a builtin to filter or take a subset of a list based on a predicate in a similar way to the Erlang lists:filter/2 function: http

[R] color individual bar of histogram?

2008-10-28 Thread Whit Armstrong
Anyone know a quick way to color one bar of a histogram? I want to mark the bar in which the most recent observation falls. So, for instance: x - rnorm(100) latest.ob - x[100] hist(x) ## how do I mark the bucket that latest.ob falls into? Thanks, Whit

Re: [R] color individual bar of histogram?

2008-10-28 Thread Whit Armstrong
That's great, Peter. Thanks very much. -Whit On Tue, Oct 28, 2008 at 3:13 PM, Peter Dalgaard [EMAIL PROTECTED] wrote: Whit Armstrong wrote: Anyone know a quick way to color one bar of a histogram? I want to mark the bar in which the most recent observation falls. So, for instance: x

Re: [R] Truncating dates (and other date-time manipulations)

2008-09-11 Thread Whit Armstrong
I'm wrapping boost date_time into an R package. I'll post it up to cran shortly. http://www.boost.org/doc/libs/1_36_0/doc/html/date_time.html I'm not sure if that is what you are looking for, but there are a lot of useful utilities in this library. -Whit On Thu, Sep 11, 2008 at 11:02 AM,

Re: [R] Truncating dates (and other date-time manipulations)

2008-09-11 Thread Whit Armstrong
probably not pre-canned routines for that, but very easy to implement with the tools provided in the library. Looks like most of what you want to do is fairly simple and not worth the trouble of involving c++. but things like month_durations and year_durations make it clear that the authors have

Re: [R] Applying user function over a large matrix

2008-04-29 Thread Whit Armstrong
are the chunks on which you need to apply the function rolling windows? do they overlap? I have some c++ template utilities that I use for window functions (on timeseries objects) which you are welcome to copy and modify to fit your problem. they are available here: git://repo.or.cz/fts.git